home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   linux.debian.kernel      Debian kernel discussions      2,884 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 2,079 of 2,884   
   Salvatore Bonaccorso to Haruka Ma   
   Bug#1122521: linux-image-6.17.11+deb14-a   
   29 Dec 25 10:40:01   
   
   XPost: linux.debian.bugs.dist   
   From: carnil@debian.org   
      
   Control: tags -1 + moreinfo   
      
   On Thu, Dec 11, 2025 at 01:48:35PM +0900, Haruka Ma wrote:   
   >   
   > Package: src:linux   
   > Version: 6.17.11-1   
   > Severity: normal   
   > X-Debbugs-Cc: none   
   >   
   >   
   > I'm using ZFS on my storage server and mounting 2 zvols on a remote machine   
   > via nvmet_rdma. I can't pinpoint the version, but after I once rebooted the   
   > server and started running on a new kernel, after running the server for a   
   > while, there will be a null pointer dereference apparently caused by   
   > nvmet_rdma (see kernel log below). It's a very hard to recover from this   
   > situation as the transport is supposed to be reliable and any packet loss   
   > would cause kernel lock-ups on the initiator side.   
   >   
   > It seems to me that this is likely a kernel issue. I haven't tried if   
   > disabling cgroup io / blkio would avoid this. As the issue only appears   
   > after hours or days of using the block device, it's currently hard for me to   
   > bisect kernel versions, as I need to at least reboot one machine to recover   
   > and it's kinda disruptive.   
   >   
   > The taints are because of loading ZFS, unfortunately due to how the storage   
   > system is designed, it's impossible for me to reproduce this without kernel   
   > taints.   
   >   
   > reportbug output below (redacted network information):   
   >   
   > -- Package-specific info:   
   > ** Version:   
   > Linux version 6.17.11+deb14-amd64 (debian-kernel@lists.debian.org)   
   > (x86_64-linux-gnu-gcc-15 (Debian 15.2.0-10) 15.2.0, GNU ld (GNU Binutils for   
   > Debian) 2.45.50.20251201) #1 SMP PREEMPT_DYNAMIC Debian 6.17.11-1   
   > (2025-12-07)   
   >   
   > ** Command line:   
   > BOOT_IMAGE=/boot@/vmlinuz-6.17.11+deb14-amd64 root=ZFS=root/root ro   
   > root=ZFS=root/root mitigations=off   
   >   
   > ** Tainted: PDWOE (12929)   
   >  * proprietary module was loaded   
   >  * kernel died recently, i.e. there was an OOPS or BUG   
   >  * kernel issued warning   
   >  * externally-built ("out-of-tree") module was loaded   
   >  * unsigned module was loaded   
   >   
   > ** Kernel log:   
   > [  558.014398] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does not   
   > match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   > [  558.018382] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does not   
   > match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   > [  558.022377] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does not   
   > match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   > [  558.030373] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does not   
   > match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   > [ 8592.007814] perf: interrupt took too long (2523 > 2500), lowering   
   > kernel.perf_event_max_sample_rate to 79250   
   > [ 9319.094555] usb 3-12.1: USB disconnect, device number 4   
   > [10455.213639] perf: interrupt took too long (3185 > 3153), lowering   
   > kernel.perf_event_max_sample_rate to 62750   
   > [11960.618904] perf: interrupt took too long (3998 > 3981), lowering   
   > kernel.perf_event_max_sample_rate to 50000   
   > [13141.438762] perf: interrupt took too long (4998 > 4997), lowering   
   > kernel.perf_event_max_sample_rate to 40000   
   > [21012.774195] perf: interrupt took too long (6261 > 6247), lowering   
   > kernel.perf_event_max_sample_rate to 31750   
   > [80158.748360] usb 3-6: USB disconnect, device number 2   
   > [87562.612495] perf: interrupt took too long (7831 > 7826), lowering   
   > kernel.perf_event_max_sample_rate to 25500   
   > [166023.147960] BUG: kernel NULL pointer dereference, address:   
   > 0000000000000028   
   > [166023.149093] #PF: supervisor read access in kernel mode   
   > [166023.150016] #PF: error_code(0x0000) - not-present page   
   > [166023.150923] PGD 0 P4D 0 [166023.151829] Oops: Oops: 0000 [#1] SMP NOPTI   
   > [166023.152736] CPU: 8 UID: 0 PID: 586 Comm: kworker/8:1H Tainted: P   
   > OE       6.17.11+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.17.11-1   
   > [166023.153661] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE,   
   > [E]=UNSIGNED_MODULE   
   > [166023.154629] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.11.0   
   > 12/23/2019   
   > [166023.155561] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]   
   > [166023.156581] RIP: 0010:blk_cgroup_bio_start+0x10/0x230   
   > [166023.157531] Code: 00 00 00 00 45 31 c0 eb da 90 90 90 90 90 90 90 90 90   
   > 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 8b 57 10 48 8b 47 48 <4c>   
   > 8b 40 28 89 d0 83 e0 01 80 fa 03 ba 02 00 00 00 48 0f 44 c2 0f   
   > [166023.159494] RSP: 0018:ffffcd998ef87c68 EFLAGS: 00010282   
   > [166023.160495] RAX: 0000000000000000 RBX: ffff8c00ae800168 RCX:   
   > 0000000000000000   
   > [166023.161536] RDX: 0000000000000000 RSI: 0000000000000000 RDI:   
   > ffff8c00ae800168   
   > [166023.162554] RBP: ffffcd998ef87cb0 R08: 0000000000001000 R09:   
   > 0000000000000028   
   > [166023.163578] R10: 0000001400000000 R11: 0000000000000000 R12:   
   > 0000000000000000   
   > [166023.164594] R13: 0000000000000000 R14: 0000000000000000 R15:   
   > 00000002a54c93a8   
   > [166023.165614] FS:  0000000000000000(0000) GS:ffff8c1fb8d08000(0000)   
   > knlGS:0000000000000000   
   > [166023.166633] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033   
   > [166023.167656] CR2: 0000000000000028 CR3: 0000002cc1c2c006 CR4:   
   > 00000000003726f0   
   > [166023.168684] Call Trace:   
   > [166023.169691]     
   > [166023.170680]  submit_bio_noacct_nocheck+0x30/0x350   
   > [166023.171685]  ? bio_associate_blkg+0x3d/0x80   
   > [166023.172672]  nvmet_bdev_execute_rw+0x29a/0x3d0 [nvmet]   
   > [166023.173676]  nvmet_rdma_execute_command+0x52/0x120 [nvmet_rdma]   
   > [166023.174649]  nvmet_rdma_handle_command+0xf7/0x2c0 [nvmet_rdma]   
   > [166023.175608]  __ib_process_cq+0x7f/0x180 [ib_core]   
   > [166023.176603]  ib_cq_poll_work+0x2a/0x80 [ib_core]   
   > [166023.177588]  process_one_work+0x18f/0x350   
   > [166023.178519]  worker_thread+0x25a/0x3a0   
   > [166023.179431]  ? __pfx_worker_thread+0x10/0x10   
   > [166023.180339]  kthread+0xf9/0x240   
   > [166023.181231]  ? __pfx_kthread+0x10/0x10   
   > [166023.182109]  ? __pfx_kthread+0x10/0x10   
   > [166023.182977]  ret_from_fork+0x194/0x1c0   
   > [166023.183837]  ? __pfx_kthread+0x10/0x10   
   > [166023.184676]  ret_from_fork_asm+0x1a/0x30   
   > [166023.185518]     
   > [166023.186333] Modules linked in: nvmet_rdma rdma_cm iw_cm ib_umad   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca