Ext4_dirent_csum_verify: No space for directory leaf checksum

Chief_Wiggum · 20 October 2018 00:18

Full logs here:

https://paste.osmc.tv/oponasuqur

This is driving me nuts. I have two 4TB hard drives in external enclosures. After booting the system, everything works fine. After a couple of hours, when trying to access directories or files on either of the drives, I start seeing:

[ 4245.780461] EXT4-fs warning (device sda1): ext4_dirent_csum_verify:353: inode #141951099: comm VideoPlayer: No space for directory leaf checksum. Please run e2fsck -D.
[ 4245.780483] EXT4-fs error (device sda1): htree_dirblock_to_tree:977: inode #141951099: comm VideoPlayer: Directory block failed checksum

I can no longer access the directories via ls or via samba. If I reboot, everything is back to normal, except mount reports errors on the file system. The messages start up again, but with a different inode number.

[  310.135138] EXT4-fs (sdb1): error count since last fsck: 1
[  310.135145] EXT4-fs (sda1): error count since last fsck: 26
[  310.135154] EXT4-fs (sda1): initial error at time 1539988115: ext4_iget:4717: inode 31719425
[  310.135165] EXT4-fs (sdb1): initial error at time 1539988145: ext4_iget:4717: inode 106168321
[  310.135175] EXT4-fs (sda1): last error at time 1539988974: ext4_iget:4717
[  310.135182] EXT4-fs (sdb1): last error at time 1539988145: ext4_iget:4717
[  310.135184] : inode 31719425
[  310.135189] : inode 106168321

Running e2fsck -D does not fix the problem. I’ve also run smartctl full tests and found no issues. These drives are less than a month old.

Also, /dev/sda1 and /dev/sdb1 should be identical. sdb1 is rsync’d to sda1 nightly. If I swap the drives at the mount point, initially everything is fine, then I start seeing the same messages (different inodes) for sdb1.

Any ideas on what I can check? I’ve searched the web for the error messages, but nothing I found seems applicable. Thanks for the help.

bmillham · 20 October 2018 02:32

Are the drives self powered, or powered by the Pi, or powered by a USB hub? If powered by the Pi, I’d suggest you get a good powered hub.

The little info I found on that error seems to point to a drive failure, so I’d start with power.

If you have another linux system, try one of the drives on the other system to see if the problem occurs there. If so, maybe the USB ports on the Pi are failing.

Chief_Wiggum · 20 October 2018 15:17

Each drive has its own brick type power adapter. I don’t think both drives are failing at the same time less than a month after being put into service??? I do have a Linux VM machine, so I can try that. Thanks for the idea.

shamael · 22 October 2018 09:52

-The drive is well mounted as writable ? (create a folder or else to confirm)
-What about the inode number? (check with “df -i”).

Chief_Wiggum · 22 October 2018 18:23

Both drives continue to be mounted RW and are in fact writable unless the directory is contained in the inode that’s being reported. df -i shows less than 1% inode use for both drives.

racin · 27 January 2019 17:35

Hi,

I have exactly the same problem – 1 external 2TB external hard drive, same ext4 error message, the drive is recent and smartctl does not report any error. Did you find the source of your problem?

Chief_Wiggum · 27 January 2019 18:27

I wish. Still having the problem. I actually get a variety of ext4 errors. I have replaced the USB adapters with no change, and I have also replaced the Raspberry Pi itself which didn’t fix the problem either. So I don’t think it’s a hardware error. My best guess at this point is that there’s some kind of mismatch between stock OSMC and some drive tool that I installed. I haven’t had time to do a full re-install, but that is what I would do next. One thing you can try is to use tune2fs to turn off metadata_csum. That actually changed the ext4 errors I was getting, but didn’t fix the issue. If you figure it out, please post back. Thanks.

dillthedog · 27 January 2019 18:48

Looking at the original post, you have one drive that you are updating – how isn’t clear – and one that is some kind of backup of the first, using rsync.

Unfortunately, if you’re, for example, torrenting to the first disk, it might be an issue with the torrent client, (the log is no longer available) and we don’t know anything about how you chose to rsync data to the second disk

I would agree that the best approach is to reinstall OSMC and see if the problem still occurs on an unmodified installation, with both disks attached. Then make changes one at a time.

Chief_Wiggum · 27 January 2019 19:45

It isn’t just the torrent client. It happens with any process that write to the disk which leads me to believe it’s an OS issue. Reading from the disk is fine until something writes to it.

dillthedog · 27 January 2019 20:19

Clearly, we don’t see many issues like this on OSMC, so your problem is very rare. If both disks are the same model from the same producer, it might also be something wrong with either the disk hardware or firmware.

JimKnopf · 28 January 2019 06:02

Perhaps, it would help others reading and investing into this if you also provide

smartctl -a, complete output info of the hdd/sdd affected and whether this is direct-attached to the Pi or some powered USB-hub is used
tune2fs -l, complete ext4 superblock info of the filesystem
e2fsck -f -D, output of the file system’s directory optimization after stopping mediacenter and unmount
details of special “drive tool(s)” installed

Please, use paste-log to upload the data and provide the URLs, here.

Chief_Wiggum · 28 January 2019 16:45

The drives are WD Red 4TB drives. They’re attached directly to the Pi via these adapters. I’ve also tried these which didn’t seem to make a difference. They have their own power supplies.

Output of smartctl
Output of tune2fs
Output of the last e2fsck check

Looking at dpkg.log, it looks like I installed e2fsprogs (1.43.4-2) and smartmontools (6.5+svn4324-1) on top of the base system. e2fsprogs was updated from 1.42.12-2+b1 to 1.43.4-2. Thank you for looking into this, I can’t seem to make any headway.

Edit: I forgot to mention another weird thing that happens. I use udisks to mount the drives by label. sda1 has the label Storage_01, sdb1 is Storage_02. Occasionally, the labels will be swapped and sda1 will be labeled Storage_02 and sdb1 will be labeled Storage_01. Possibly related?

JimKnopf · 28 January 2019 22:44

Mmhhh, the only obvious thing I can see is that this hdd never has run a SMART selftest, since the self-test log is empty.
If this would be mine device, I first would run a long smart test which could take hours but you know afterwards that you don’t try to ride a dead horse.

So, run a smartctl -t long … but you have to keep the disk active every few minutes otherwise it could hibernate and you could see an “aborted by user/host” self-test log entry or so. A simple while true do loop touching a file in a directory on the disk’s file system, immediate remove the file and sleep a few minutes is sufficient. With smartctl -a you can check the the progress at the top in the selftest exection status.

The mapping issue sound like some timing difference; as long as the same disk is always mounted to the same mount point, no real problem.
I’m using fstab and the UUID provided by blkid to guarantee that the disks always get to the same mount points, example line in fstab:

UUID=538214da-b9d1-4460-9e87-de3efcb5da0a /mnt/Intenso2766GB ext4 defaults 0 3

Chief_Wiggum · 29 January 2019 23:20

Okay. It took me a couple tries to get the long self test to run properly, but it did complete with 0 errors. Please see the updated smartctl -a output. Thank you.

JimKnopf · 30 January 2019 06:26

So, this is good news and removes doubts about the reliability of this disk.

I haven’t found much helpful for this specific errors and the firmware of the WD drive, so the pragmatic approach I would use, if fsck is not able to permanently correct the issue: Recreate the file system from scratch.

If there are only large files to be placed on the disks like videos, backups, etc. consider to choose a usage-type with a huge inodes-per-disk-space ratio like I’ve chosen for my 3TB Toshiba disks i.e. mkfs.ext4 -T largefile4 …. The usage-types are listed in /etc/mke2fs.conf.

Also the default reserved disc space for super-user activities is 5% by default which makes around 180-190 GB with your WD drives: A quite large (wasted) amount of disk space with 4 TB drives. You can influence this amount also using mkfs.ext4and the -m parameter.

Chief_Wiggum · 25 September 2019 22:19

It’s been a long while, but I wanted to reply and mark this solved. The problem turned out to be the SATA to USB adapters. They were the ORICO Toolfree USB 3.0 to SATA External 3.5 Hard Drive Enclosure Cases

I have since replaced them with these and everything has been working great for two weeks or so now.

Part of the problem I had diagnosing this was that I bought a replacement SATA adapter, but I attached it to the wrong drive for testing. For some reason I thought the devices in /dev correlated to the physical USB slots. USB0 for /dev/sda and USB1 for /dev/sdb and so on. Instead, whichever device reports ready first is attached to /dev/sda. So one boot USB0 might be /dev/sda and the next boot it might be /dev/sdb. Once I figured that out everything made a lot more sense.

Thank you all for your help.

fzinken · 26 September 2019 00:05

To avoid such issues it’s recommended to use the UUID or Labels instead for mounting