Ext4_dirent_csum_verify: No space for directory leaf checksum


#1

Full logs here:

https://paste.osmc.tv/oponasuqur

This is driving me nuts. I have two 4TB hard drives in external enclosures. After booting the system, everything works fine. After a couple of hours, when trying to access directories or files on either of the drives, I start seeing:

[ 4245.780461] EXT4-fs warning (device sda1): ext4_dirent_csum_verify:353: inode #141951099: comm VideoPlayer: No space for directory leaf checksum. Please run e2fsck -D.
[ 4245.780483] EXT4-fs error (device sda1): htree_dirblock_to_tree:977: inode #141951099: comm VideoPlayer: Directory block failed checksum

I can no longer access the directories via ls or via samba. If I reboot, everything is back to normal, except mount reports errors on the file system. The messages start up again, but with a different inode number.

[  310.135138] EXT4-fs (sdb1): error count since last fsck: 1
[  310.135145] EXT4-fs (sda1): error count since last fsck: 26
[  310.135154] EXT4-fs (sda1): initial error at time 1539988115: ext4_iget:4717: inode 31719425
[  310.135165] EXT4-fs (sdb1): initial error at time 1539988145: ext4_iget:4717: inode 106168321
[  310.135175] EXT4-fs (sda1): last error at time 1539988974: ext4_iget:4717
[  310.135182] EXT4-fs (sdb1): last error at time 1539988145: ext4_iget:4717
[  310.135184] : inode 31719425
[  310.135189] : inode 106168321

Running e2fsck -D does not fix the problem. I’ve also run smartctl full tests and found no issues. These drives are less than a month old.

Also, /dev/sda1 and /dev/sdb1 should be identical. sdb1 is rsync’d to sda1 nightly. If I swap the drives at the mount point, initially everything is fine, then I start seeing the same messages (different inodes) for sdb1.

Any ideas on what I can check? I’ve searched the web for the error messages, but nothing I found seems applicable. Thanks for the help.


#2

Are the drives self powered, or powered by the Pi, or powered by a USB hub? If powered by the Pi, I’d suggest you get a good powered hub.

The little info I found on that error seems to point to a drive failure, so I’d start with power.

If you have another linux system, try one of the drives on the other system to see if the problem occurs there. If so, maybe the USB ports on the Pi are failing.


#3

Each drive has its own brick type power adapter. I don’t think both drives are failing at the same time less than a month after being put into service??? I do have a Linux VM machine, so I can try that. Thanks for the idea.


#4

-The drive is well mounted as writable ? (create a folder or else to confirm)
-What about the inode number? (check with “df -i”).


#5

Both drives continue to be mounted RW and are in fact writable unless the directory is contained in the inode that’s being reported. df -i shows less than 1% inode use for both drives.


#6

Hi,

I have exactly the same problem – 1 external 2TB external hard drive, same ext4 error message, the drive is recent and smartctl does not report any error. Did you find the source of your problem?


#7

I wish. Still having the problem. I actually get a variety of ext4 errors. I have replaced the USB adapters with no change, and I have also replaced the Raspberry Pi itself which didn’t fix the problem either. So I don’t think it’s a hardware error. My best guess at this point is that there’s some kind of mismatch between stock OSMC and some drive tool that I installed. I haven’t had time to do a full re-install, but that is what I would do next. One thing you can try is to use tune2fs to turn off metadata_csum. That actually changed the ext4 errors I was getting, but didn’t fix the issue. If you figure it out, please post back. Thanks.


#8

Looking at the original post, you have one drive that you are updating – how isn’t clear – and one that is some kind of backup of the first, using rsync.

Unfortunately, if you’re, for example, torrenting to the first disk, it might be an issue with the torrent client, (the log is no longer available) and we don’t know anything about how you chose to rsync data to the second disk

I would agree that the best approach is to reinstall OSMC and see if the problem still occurs on an unmodified installation, with both disks attached. Then make changes one at a time.


#9

It isn’t just the torrent client. It happens with any process that write to the disk which leads me to believe it’s an OS issue. Reading from the disk is fine until something writes to it.


#10

Clearly, we don’t see many issues like this on OSMC, so your problem is very rare. If both disks are the same model from the same producer, it might also be something wrong with either the disk hardware or firmware.


#11

Perhaps, it would help others reading and investing into this if you also provide

  • smartctl -a, complete output info of the hdd/sdd affected and whether this is direct-attached to the Pi or some powered USB-hub is used
  • tune2fs -l, complete ext4 superblock info of the filesystem
  • e2fsck -f -D, output of the file system’s directory optimization after stopping mediacenter and unmount
  • details of special “drive tool(s)” installed

Please, use paste-log to upload the data and provide the URLs, here.


#12

The drives are WD Red 4TB drives. They’re attached directly to the Pi via these adapters. I’ve also tried these which didn’t seem to make a difference. They have their own power supplies.

Output of smartctl
Output of tune2fs
Output of the last e2fsck check

Looking at dpkg.log, it looks like I installed e2fsprogs (1.43.4-2) and smartmontools (6.5+svn4324-1) on top of the base system. e2fsprogs was updated from 1.42.12-2+b1 to 1.43.4-2. Thank you for looking into this, I can’t seem to make any headway.

Edit: I forgot to mention another weird thing that happens. I use udisks to mount the drives by label. sda1 has the label Storage_01, sdb1 is Storage_02. Occasionally, the labels will be swapped and sda1 will be labeled Storage_02 and sdb1 will be labeled Storage_01. Possibly related?


#13

Mmhhh, the only obvious thing I can see is that this hdd never has run a SMART selftest, since the self-test log is empty.
If this would be mine device, I first would run a long smart test which could take hours but you know afterwards that you don’t try to ride a dead horse.

So, run a smartctl -t long … but you have to keep the disk active every few minutes otherwise it could hibernate and you could see an “aborted by user/host” self-test log entry or so. A simple while true do loop touching a file in a directory on the disk’s file system, immediate remove the file and sleep a few minutes is sufficient. With smartctl -a you can check the the progress at the top in the selftest exection status.

The mapping issue sound like some timing difference; as long as the same disk is always mounted to the same mount point, no real problem.
I’m using fstab and the UUID provided by blkid to guarantee that the disks always get to the same mount points, example line in fstab:

UUID=538214da-b9d1-4460-9e87-de3efcb5da0a /mnt/Intenso2766GB ext4 defaults 0 3


#14

Okay. It took me a couple tries to get the long self test to run properly, but it did complete with 0 errors. Please see the updated smartctl -a output. Thank you.


#15

So, this is good news and removes doubts about the reliability of this disk.

I haven’t found much helpful for this specific errors and the firmware of the WD drive, so the pragmatic approach I would use, if fsck is not able to permanently correct the issue: Recreate the file system from scratch.

If there are only large files to be placed on the disks like videos, backups, etc. consider to choose a usage-type with a huge inodes-per-disk-space ratio like I’ve chosen for my 3TB Toshiba disks i.e. mkfs.ext4 -T largefile4 …. The usage-types are listed in /etc/mke2fs.conf.

Also the default reserved disc space for super-user activities is 5% by default which makes around 180-190 GB with your WD drives: A quite large (wasted) amount of disk space with 4 TB drives. You can influence this amount also using mkfs.ext4and the -m parameter.