... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
linux.debian.kernel
Debian kernel discussions
2,884 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 1,660 of 2,884
Yu Kuai to All
Bug#1121006: raid10 and component device
23 Nov 25 04:20:02
   XPost: linux.debian.bugs.dist   
   From: yukuai@fnnas.com   
      
   Hi,   
      
   在 2025/11/21 19:01, Filippo Giunchedi 写道:   
   > Hello linux-raid,   
   > I'm seeking assistance with the following bug: recent versions of mpt3sas   
   > started announcing drive's optimal_io_size of 0xFFF000 and when said drives   
   are   
   > part of a mdraid raid10 the array's optimal_io_size results in 0xFFF000.   
   >   
   > When an LVM PV is created on the array its metadata area by default is   
   aligned   
   > with its optimal_io_size, resulting in an abnormally-large size of ~4GB.   
   During   
   > GRUB's LVM detection an allocation is made based on the metadata area size   
   > which results in an unbootable system. This problem shows up only for   
   > newly-created PVs and thus systems with existing PVs are not affected in my   
   > testing.   
   >   
   > I was able to reproduce the problem on qemu using scsi-hd devices as shown   
   > below and on https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1121006. The   
   bug   
   > is present both on Debian' stable kernel and Linux 6.18, though I haven't yet   
   > determined when the change was introduced in mpt3sas.   
   >   
   > I'm wondering where the problem is in this case and what could be done to fix   
   > it?   
      
   You can take a look at the following thread.   
      
   Re: [PATCH 1/2] block: ignore underlying non-stack devices io_opt - Yu Kuai   
      
      
   > thank you,   
   > Filippo   
   >   
   > On Thu, Nov 20, 2025 at 02:43:24PM +0000, Filippo Giunchedi wrote:   
   >> Hello Salvatore,   
   >> Thank you for the quick reply.   
   >>   
   >> On Wed, Nov 19, 2025 at 05:59:48PM +0100, Salvatore Bonaccorso wrote:   
   >> [...]   
   >>>>          Capabilities: [348] Vendor Specific Information: ID=0001 Rev=1   
   Len=038    
   >>>>          Capabilities: [380] Data Link Feature    
   >>>>          Kernel driver in use: mpt3sas   
   >>> This sounds like quite an intersting finding but probably hard to   
   >>> reproduce without the hardware if it comes to be specific to the   
   >>> controller type and driver.   
   >> That's a great point re: reproducibility, and it got me curious on   
   something I   
   >> hadn't thought of testing. Namely if there's another angle to this: does any   
   >> block device with the same block I/O hints exhibit the same problem? The   
   answer is   
   >> actually "yes".   
   >>   
   >> I used qemu 'scsi-hd' device to set the same values to be able to test   
   locally.   
   >> On an already-installed VM I added the following to present four new   
   devices:   
   >>   
   >> -device virtio-scsi-pci,id=scsi0   
   >>   
   >> -drive file=./workdir/disks/disk3.qcow2,format=qcow2,if=none,id=drive3   
   >> -device scsi-hd,bus=scsi0.0,drive=drive3,physical_block_size=   
   096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120   
   >>   
   >> -drive file=./workdir/disks/disk4.qcow2,format=qcow2,if=none,id=drive4   
   >> -device scsi-hd,bus=scsi0.0,drive=drive4,physical_block_size=   
   096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120   
   >>   
   >> -drive file=./workdir/disks/disk5.qcow2,format=qcow2,if=none,id=drive5   
   >> -device scsi-hd,bus=scsi0.0,drive=drive5,physical_block_size=   
   096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120   
   >>   
   >> -drive file=./workdir/disks/disk6.qcow2,format=qcow2,if=none,id=drive6   
   >> -device scsi-hd,bus=scsi0.0,drive=drive6,physical_block_size=   
   096,logical_block_size=512,min_io_size=4096,opt_io_size=16773120   
   >>   
   >> I used 10G files with 'qemu-img create -f qcow2  10G' though size   
   doesn't   
   >> affect anything in my testing.   
   >>   
   >> Then in the VM:   
   >>   
   >> # cat /sys/block/sd[cdef]/queue/optimal_io_size   
   >> 16773120   
   >> 16773120   
   >> 16773120   
   >> 16773120   
   >> # mdadm --create /dev/md1 --level 10 --bitmap none --raid-devices 4   
   /dev/sdc /dev/sdd /dev/sde /dev/sdf   
   >> mdadm: Defaulting to version 1.2 metadata   
   >> mdadm: array /dev/md1 started.   
   >> # cat /sys/block/md1/queue/optimal_io_size   
   >> 4293918720   
   >>   
   >> I was able to reproduce the problem with src:linux 6.18~rc6-1~exp1 as well   
   as 6.12.57-1.   
   >>   
   >> Since it is easy to test this way I tried with a few different opt_io_size   
   values and   
   >> was able to reproduce only with 16773120 (i.e. 0xFFF000).   
   >>   
   >>> I would like to ask: Do you have the possibility to make an OS   
   >>> instalaltion such that you can freely experiment with various kernels   
   >>> and then under them assemble the arrays? If so that would be great   
   >>> that you could start bisecting the changes to find where find changes.   
   >>>   
   >>> I.e. install the OS independtly on the controller, find by bisecting   
   >>> Debian versions manually the kernels between bookworm and trixie   
   >>> (6.1.y -> 6.12.y to narrow down the upsream range).   
   >> Yes I'm able to perform testing on this host, in fact I worked around the   
   >> problem for now by disabling LVM's md alignment auto detection and thus we   
   have   
   >> an installed system.   
   >> For reference that's "devices { data_alignment_detection = 0 }" in lvm's   
   >> config.   
   >>   
   >>> Then bisect the ustream changes to find the offending commits. Let me   
   >>> know if you need more specific instructions on the idea.   
   >> Having pointers on how the recommended way to build Debian kernels would be   
   of   
   >> great help, thank you!   
   >>   
   >>> Additionally it would be interesting to know if the issue persist in   
   >>> 6.17.8 or even 6.18~rc6-1~exp1 to be able to clearly indicate upstream   
   >>> that the issue persist in upper kernels.   
   >>>   
   >>> Idealy actually this goes asap to upstream once we are more confident   
   >>> ont the subsystem to where to report the issue. If we are reasonably   
   >>> confident it it mpt3sas specific already then I would say to go   
   >>> already to:   
   >> Given the qemu-based reproducer above, maybe this issue is actually two   
   bugs:   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]