Vero 4k USB HDD freeze

Hi,

I’m running a vero 4k+ with the latest update connected to the osmc official usb HUB, which is connected to an usb HDD (WD Elements).

Everything works correctly, but after a random period of time, I stop being able to access the HDD.
I can still perform commands over ssh, but any command which uses the HDD (E.g. ls /mnt/data) does not complete.

At first I tought it might be the HDD dying but I replaced it with a new one and got the same problem.

This is a snippet of from dmesg -T errors I see:

[Wed Jan 12 17:52:11 2022] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Wed Jan 12 17:52:11 2022] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x2 [current]
[Wed Jan 12 17:52:11 2022] sd 0:0:0:0: [sda] tag#0 ASC=0x4 ASCQ=0x1
[Wed Jan 12 17:52:11 2022] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 49 f0 1a 80 00 00 00 f0 00 00
[Wed Jan 12 17:52:11 2022] blk_update_request: I/O error, dev sda, sector 1240472192
[Wed Jan 12 17:52:13 2022] sd 0:0:0:0: timing out command, waited 180s
[Wed Jan 12 17:52:13 2022] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Wed Jan 12 17:52:13 2022] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x2 [current]
[Wed Jan 12 17:52:13 2022] sd 0:0:0:0: [sda] tag#0 ASC=0x4 ASCQ=0x1
[Wed Jan 12 17:52:13 2022] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 49 f0 1b 70 00 00 00 10 00 00
[Wed Jan 12 17:52:13 2022] blk_update_request: I/O error, dev sda, sector 1240472432
[Wed Jan 12 17:55:11 2022] sd 0:0:0:0: timing out command, waited 180s
[Wed Jan 12 17:55:11 2022] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Wed Jan 12 17:55:11 2022] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x4 [current]
[Wed Jan 12 17:55:11 2022] sd 0:0:0:0: [sda] tag#0 ASC=0x44 <<vendor>>ASCQ=0x81
[Wed Jan 12 17:55:11 2022] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 01 69 84 cb 18 00 00 00 18 00 00
[Wed Jan 12 17:55:11 2022] blk_update_request: I/O error, dev sda, sector 6065277720
[Wed Jan 12 17:55:11 2022] EXT4-fs warning (device sda2): ext4_end_bio:313: I/O error -5 writing to inode 189793106 (offset 1458176 size 8192 starting block 758159718)
[Wed Jan 12 17:55:11 2022] Buffer I/O error on device sda2, logical block 752916579
[Wed Jan 12 17:55:11 2022] Buffer I/O error on device sda2, logical block 752916580
[Wed Jan 12 17:55:11 2022] Buffer I/O error on device sda2, logical block 752916581
[Wed Jan 12 17:55:13 2022] sd 0:0:0:0: timing out command, waited 180s
[Wed Jan 12 17:55:13 2022] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Wed Jan 12 17:55:13 2022] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x4 [current]
[Wed Jan 12 17:55:13 2022] sd 0:0:0:0: [sda] tag#0 ASC=0x44 <<vendor>>ASCQ=0x81
[Wed Jan 12 17:55:13 2022] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 01 69 62 c6 60 00 00 00 18 00 00
[Wed Jan 12 17:55:13 2022] blk_update_request: I/O error, dev sda, sector 6063048288
[Wed Jan 12 17:55:13 2022] EXT4-fs warning (device sda2): ext4_end_bio:313: I/O error -5 writing to inode 188745075 (offset 3985408 size 8192 starting block 757881039)
[Wed Jan 12 17:55:13 2022] Buffer I/O error on device sda2, logical block 752637900
[Wed Jan 12 17:55:13 2022] Buffer I/O error on device sda2, logical block 752637901
[Wed Jan 12 17:55:13 2022] Buffer I/O error on device sda2, logical block 752637902
[Wed Jan 12 17:58:11 2022] sd 0:0:0:0: timing out command, waited 180s
[Wed Jan 12 17:58:11 2022] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Wed Jan 12 17:58:11 2022] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x4 [current]
[Wed Jan 12 17:58:11 2022] sd 0:0:0:0: [sda] tag#0 ASC=0x44 <<vendor>>ASCQ=0x81
[Wed Jan 12 17:58:11 2022] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 00 ea 0e ef 90 00 00 00 28 00 00
[Wed Jan 12 17:58:11 2022] blk_update_request: I/O error, dev sda, sector 3926847376
[Wed Jan 12 17:58:11 2022] Aborting journal on device sda2-8.
[Wed Jan 12 17:58:11 2022] EXT4-fs (sda2): Delayed block allocation failed for inode 187960492 at logical offset 16 with max blocks 2 with error 30
[Wed Jan 12 17:58:11 2022] EXT4-fs (sda2): This should not happen!! Data will be lost

[Wed Jan 12 17:58:11 2022] EXT4-fs error (device sda2) in ext4_writepages:2854: Journal has aborted

I can’t even restart the vero as it hangs on the HDD. The only thing that makes the system responsive again is to physically reconnect the HDD. At the end of the logs, we can see me doing that and the system become responsive again.

My previous USB hub had this issue as well…
Is there a possibility this could be a software issue? Maybe a component in vero that got faulty over time?

Full logs: http://paste.osmc.tv/bilizureqa

Thanks for your help!

Also, not sure if it can be related but, when I plugged in both the USB HDD disks into the USB hub to copy the data between them, there wasn’t enough power and they kept spinning up and spinning down. I was thinking that a USB hub of 5V/2A would be able to handle that so… One more datapoint…

My guess is the PSU for the USB hub is going weak. Can you put your hands on a replacement (5V 2A or more)?

I don’t currently have one at hand with the same plug as the hub requires. But I tried out my previous USB hub (which uses a different PSU - also 5V/2A) and it also can’t spin up both disks.

I can try to measure the measure the amps being consumed by the PSU.

Is that the same make/model of hub?

No, it’s a different one. That one even has power backfeed problems. That’s the reason why I changed to the OSMC store hub.

Have you tried connecting your HDD direct to Vero? These things tend to pull more than 500mA on start-up but less than 500mA when running. So best to start the Vero then plug in the drive since Vero will also draw more power on start-up.

If that works then it seems unlikely to be a software issue. Check the disk out with fsck to clear any errors. Then you might try some different parameters in fstab to make things more robust. My fstab line is (can’t remember where I got it from):

UUID=59243619-bc19-4511-9652-06d20a51f39e /mnt/1Text4 ext4 defaults,noatime,users,nofail,x-systemd.mount-timeout=30 0 0

Meanwhile, I do think a new PSU for the USB hub is worth a few euros to try.

I’ve used a different 5V/2A PSU and disabled the drive’s power management (hdparm -B 255 /dev/sda) and it’s been working for a few days without freezing.

I’ve also been monitoring the PSU’s power consumption to check for any possible spikes.

If you’re able, I’d check SMART values on that disc.

Is it making any clicking noises?

Try the other USB port.

Is this on the 5V side? An average of 800mA and never less than 700mA looks like a lot.

If you’re able, I’d check SMART values on that disc.

This disk is brand new.

> sudo smartctl -a -d sat /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-4.9.113-60-osmc] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD40NDZW-11BCSS0
Serial Number:    WD-WX32D717R4DL
LU WWN Device Id: 5 0014ee 269d9605b
Firmware Version: 01.01A01
User Capacity:    4,000,753,475,584 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    4800 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jan 17 12:11:29 2022 WET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		( 6780) seconds.
Offline data collection
capabilities: 			 (0x1b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  62) minutes.
SCT capabilities: 	       (0x30b5)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       3108
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       65
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       142
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       45
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       29
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       281
194 Temperature_Celsius     0x0022   106   098   000    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

Is it making any clicking noises?

No clicking noises before, not sure if it started clicking when it started to freeze as It was random.

Try the other USB port.

I did try with the other USB port as well since, for a few versions, the white USB was not working for me, but got the same problem.

The white USB port should now be fixed in the latest update.

Is this on the 5V side? An average of 800mA and never less than 700mA looks like a lot.

On the 220V side. With a smart plug. Its values diped lower when I didn’t disable the APM but trying it out like this to see if it’s stable.

The white USB port should now be fixed in the latest update.

That’s the port I’m running it on right now. Not sure if there’s a difference (was running on the other one before).

But, with the previous setup the random feezes happened in less than 2 days, so I’m optimistic with these changes :sweat_smile:

I had no end of troubles with the Vero 4K+ and OSMC powered hub. I eventually bought a different powered hub (Nedis 5 port, with a 12v@2A power supply) - I’m using Seagate 5TB expansion HDD’s and they run 700-800ma at times. The new hub works just fine and I no longer have the random failures/disconnects/not ready errors that I had previously.

I currently have the remote dongle and two of the HDD’s in the Nedis hub and a third HDD directly attached to the Vero.