[workaround] Vero364-image-4.9.113-45-osmc fails to mount large ext4 filesystem

First, congratulations on the new 2021.08-1 release!

I expected a few bumps after upgrade, and negotiated some minor issues already.

But I’m stuck on what appears to be a regression in the 4.9 kernel on a Vero4K device:

root@vero4k:/# e2fsck -fv /dev/storage/array1
e2fsck 1.44.5 (15-Dec-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

       21474 inodes used (0.00%, out of 732553216)
        4936 non-contiguous files (23.0%)
          52 non-contiguous directories (0.2%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 16622/4822/21
  2365900915 blocks used (40.37%, out of 5860419584)
           0 bad blocks
        1503 large files

       19695 regular files
        1769 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           1 symbolic link (1 fast symbolic link)
           0 sockets
------------
       21465 files
root@vero4k:/# mount -t ext4 /dev/storage/array1 /mnt/storage
mount: /mnt/storage: mount(2) system call failed: Structure needs cleaning.

The block device is an LVM logical volume with an ext4 filesystem backed by a software-controlled RAID-5 array. Note, the filesystem is over 20TB and requires a 64-bit kernel to handle correctly.

All of my web searching thus far for “Structure needs cleaning” suggests filesystem corruption, but as best as I can tell the filesystem is clean (as shown above) and the RAID array is working nominally (as shown below).

No problems mounting with the 3.14 kernel, but the 4.9 kernel just won’t do it.

I installed debug packages and traced /bin/mount to the mount(2) syscall just to verify the error message is indeed coming from the 4.9 kernel, but I’m a novice at kernel debugging so the trail has gone cold for me.

Here’s more storage details:

root@vero4k:/# lsblk
NAME                 MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                    8:0    0  7.3T  0 disk  
└─sda1                 8:1    0  7.3T  0 part  
  └─md127              9:127  0 21.9T  0 raid5 
    └─storage-array1 254:1    0 21.9T  0 lvm   
sdb                    8:16   0  7.3T  0 disk  
└─sdb1                 8:17   0  7.3T  0 part  
  └─md127              9:127  0 21.9T  0 raid5 
    └─storage-array1 254:1    0 21.9T  0 lvm   
sdc                    8:32   0  7.3T  0 disk  
└─sdc1                 8:33   0  7.3T  0 part  
  └─md127              9:127  0 21.9T  0 raid5 
    └─storage-array1 254:1    0 21.9T  0 lvm   
sdd                    8:48   0  7.3T  0 disk  
└─sdd1                 8:49   0  7.3T  0 part  
  └─md127              9:127  0 21.9T  0 raid5 
    └─storage-array1 254:1    0 21.9T  0 lvm   
mmcblk0              179:0    0 14.6G  0 disk  
mmcblk0boot0         179:32   0    4M  0 disk  
mmcblk0boot1         179:64   0    4M  0 disk  
mmcblk0rpmb          179:96   0    4M  0 disk  

root@vero4k:/# lvdisplay storage
  --- Logical volume ---
  LV Path                /dev/storage/array1
  LV Name                array1
  VG Name                storage
  LV UUID                xIkqFS-miW4-XYuw-RPU4-gELx-Qm3e-D9jq5U
  LV Write Access        read/write
  LV Creation host, time osmc, 2020-01-27 13:49:39 -0500
  LV Status              available
  # open                 0
  LV Size                21.83 TiB
  Current LE             5723066
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     6144
  Block device           254:1

root@vero4k:/# vgdisplay storage
  --- Volume group ---
  VG Name               storage
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               21.83 TiB
  PE Size               4.00 MiB
  Total PE              5723066
  Alloc PE / Size       5723066 / 21.83 TiB
  Free  PE / Size       0 / 0   
  VG UUID               NJxjXk-XXNS-u3RW-Ra0b-CgjS-eqHZ-g8CeKn

root@vero4k:/# pvdisplay /dev/md127
  Error reading device /dev/mmcblk0rpmb at 0 length 512.
  Error reading device /dev/mmcblk0rpmb at 0 length 4096.
  --- Physical volume ---
  PV Name               /dev/md127
  VG Name               storage
  PV Size               21.83 TiB / not usable 1.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              5723066
  Free PE               0
  Allocated PE          5723066
  PV UUID               998TFi-Jugd-TupT-sYdz-E4eo-Y7eE-I1ws32

root@vero4k:/# mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Sat Jan 18 15:31:11 2020
        Raid Level : raid5
        Array Size : 23441679360 (22355.73 GiB 24004.28 GB)
     Used Dev Size : 7813893120 (7451.91 GiB 8001.43 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Aug  9 21:23:28 2021
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : unknown

              Name : osmc-storage-array1
              UUID : fb3dd008:ad989aa9:bd93cb0c:2f0f242c
            Events : 992601

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       33        1      active sync   /dev/sdc1
       3       8       49        2      active sync   /dev/sdd1
       4       8       17        3      active sync   /dev/sdb1

Checked dmesg while mounting and found these errors:

EXT4-fs (dm-1): ext4_check_descriptors: Block bitmap for group 0 overlaps block group descriptors                                                                                     
EXT4-fs (dm-1): group descriptors corrupted!

Further searches for this message led me to this reddit post that’s similar to my situation. One of the later comments claims (wrt Ubuntu versions):

I had this on 16.04 LTS and then upgraded to 18.04LTS and have this now. It was a kernel bug with the ext4 update code last time. Try to boot -43 and it will boot fine (for me…)

Trying now to chase down this alleged ext4 kernel bug.

It looks like this has occurred a couple of times before like here at Suse, which got fixed by a kernel release (4.4.155)

  • ext4: fix false negatives and false positives in ext4_check_descriptors() (bsc#1103445)

This article seems interesting. It suggests mounting read-only and then re-mounting read-write would bypass the issue. The reply is dated July 2018, the kernel we’re using is built in August 2021, but versions are close 4.9.110-2 vs 4.9.113-45-osmc, so the fix might’ve not been merged yet.

I see the Debian bug has a patch attached but I also found this kernel commit from around the same timeframe. The description matches my error message with respect to block group 0 and it also mentions mounting read-only and remounting read-write.

Not sure which patch is the one needed here.
Edit On further inspection the Debian attachment seems more likely.

1 Like

The patches being made by the same person doesn’t help either LoL

I’m not a hero when it comes to kernel patching, but I agree that the debian attachment seems more applicable in your situation.

I personally would verify if the mount/re-mount works and use that until a fixed kernel arrived :slight_smile:

1 Like

Indeed, the remount workaround worked!

root@vero4k:/# mount -t ext4 -o ro /dev/storage/array1 /mnt/storage
root@vero4k:/# mount -t ext4 -o rw,remount /dev/storage/array1 /mnt/storage
root@vero4k:/# ls /mnt/storage
 backup   Downloads   lost+found   Miscellaneous   Movies   Music  'TV Shows'

Thanks much for the guidance!

Tagging @sam_nazarko for awareness.

2 Likes

Most excellent, I’ll have a drink to that :slight_smile:

1 Like

Excellent work, guys.

Just to make Sam’s life a little easier, here’s a link to the Debian patch referred to above:

https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=903838;filename=ext4-fix-false-negatives-and-false-positives-in-ext4.patch;msg=26

2 Likes

I’ll check this out shortly. There are a couple of other things that I want to get in for the next kernel.

This will be fixed in the next kernel and I’ve cherry-picked that commit

Many thanks

Sam

1 Like