Forums before death by AOL, social media and spammers... "We can't have nice things"
|    linux.debian.kernel    |    Debian kernel discussions    |    2,884 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,910 of 2,884    |
|    Haruka Ma to All    |
|    Bug#1122521: linux-image-6.17.11+deb14-a    |
|    11 Dec 25 06:00:01    |
   
   XPost: linux.debian.bugs.dist   
   From: mrx@hcc.im   
      
   Package: src:linux   
   Version: 6.17.11-1   
   Severity: normal   
   X-Debbugs-Cc: none   
      
      
   I'm using ZFS on my storage server and mounting 2 zvols on a remote   
   machine via nvmet_rdma. I can't pinpoint the version, but after I once   
   rebooted the server and started running on a new kernel, after running   
   the server for a while, there will be a null pointer dereference   
   apparently caused by nvmet_rdma (see kernel log below). It's a very hard   
   to recover from this situation as the transport is supposed to be   
   reliable and any packet loss would cause kernel lock-ups on the   
   initiator side.   
      
   It seems to me that this is likely a kernel issue. I haven't tried if   
   disabling cgroup io / blkio would avoid this. As the issue only appears   
   after hours or days of using the block device, it's currently hard for   
   me to bisect kernel versions, as I need to at least reboot one machine   
   to recover and it's kinda disruptive.   
      
   The taints are because of loading ZFS, unfortunately due to how the   
   storage system is designed, it's impossible for me to reproduce this   
   without kernel taints.   
      
   reportbug output below (redacted network information):   
      
   -- Package-specific info:   
   ** Version:   
   Linux version 6.17.11+deb14-amd64 (debian-kernel@lists.debian.org)   
   (x86_64-linux-gnu-gcc-15 (Debian 15.2.0-10) 15.2.0, GNU ld (GNU Binutils   
   for Debian) 2.45.50.20251201) #1 SMP PREEMPT_DYNAMIC Debian 6.17.11-1   
   (2025-12-07)   
      
   ** Command line:   
   BOOT_IMAGE=/boot@/vmlinuz-6.17.11+deb14-amd64 root=ZFS=root/root ro   
   root=ZFS=root/root mitigations=off   
      
   ** Tainted: PDWOE (12929)   
    * proprietary module was loaded   
    * kernel died recently, i.e. there was an OOPS or BUG   
    * kernel issued warning   
    * externally-built ("out-of-tree") module was loaded   
    * unsigned module was loaded   
      
   ** Kernel log:   
   [ 558.014398] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does   
   not match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   [ 558.018382] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does   
   not match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   [ 558.022377] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does   
   not match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   [ 558.030373] TARGET_CORE[iSCSI]: Expected Transfer Length: 4096 does   
   not match SCSI CDB Length: 255 for SAM Opcode: 0x12   
   [ 8592.007814] perf: interrupt took too long (2523 > 2500), lowering   
   kernel.perf_event_max_sample_rate to 79250   
   [ 9319.094555] usb 3-12.1: USB disconnect, device number 4   
   [10455.213639] perf: interrupt took too long (3185 > 3153), lowering   
   kernel.perf_event_max_sample_rate to 62750   
   [11960.618904] perf: interrupt took too long (3998 > 3981), lowering   
   kernel.perf_event_max_sample_rate to 50000   
   [13141.438762] perf: interrupt took too long (4998 > 4997), lowering   
   kernel.perf_event_max_sample_rate to 40000   
   [21012.774195] perf: interrupt took too long (6261 > 6247), lowering   
   kernel.perf_event_max_sample_rate to 31750   
   [80158.748360] usb 3-6: USB disconnect, device number 2   
   [87562.612495] perf: interrupt took too long (7831 > 7826), lowering   
   kernel.perf_event_max_sample_rate to 25500   
   [166023.147960] BUG: kernel NULL pointer dereference, address:   
   0000000000000028   
   [166023.149093] #PF: supervisor read access in kernel mode   
   [166023.150016] #PF: error_code(0x0000) - not-present page   
   [166023.150923] PGD 0 P4D 0 [166023.151829] Oops: Oops: 0000 [#1] SMP NOPTI   
   [166023.152736] CPU: 8 UID: 0 PID: 586 Comm: kworker/8:1H Tainted: P   
    OE 6.17.11+deb14-amd64 #1 PREEMPT(lazy) Debian 6.17.11-1   
   [166023.153661] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE,   
   [E]=UNSIGNED_MODULE   
   [166023.154629] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS   
   2.11.0 12/23/2019   
   [166023.155561] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]   
   [166023.156581] RIP: 0010:blk_cgroup_bio_start+0x10/0x230   
   [166023.157531] Code: 00 00 00 00 45 31 c0 eb da 90 90 90 90 90 90 90 90   
   90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 8b 57 10 48 8b 47   
   48 <4c> 8b 40 28 89 d0 83 e0 01 80 fa 03 ba 02 00 00 00 48 0f 44 c2 0f   
   [166023.159494] RSP: 0018:ffffcd998ef87c68 EFLAGS: 00010282   
   [166023.160495] RAX: 0000000000000000 RBX: ffff8c00ae800168 RCX:   
   0000000000000000   
   [166023.161536] RDX: 0000000000000000 RSI: 0000000000000000 RDI:   
   ffff8c00ae800168   
   [166023.162554] RBP: ffffcd998ef87cb0 R08: 0000000000001000 R09:   
   0000000000000028   
   [166023.163578] R10: 0000001400000000 R11: 0000000000000000 R12:   
   0000000000000000   
   [166023.164594] R13: 0000000000000000 R14: 0000000000000000 R15:   
   00000002a54c93a8   
   [166023.165614] FS: 0000000000000000(0000) GS:ffff8c1fb8d08000(0000)   
   knlGS:0000000000000000   
   [166023.166633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033   
   [166023.167656] CR2: 0000000000000028 CR3: 0000002cc1c2c006 CR4:   
   00000000003726f0   
   [166023.168684] Call Trace:   
   [166023.169691]
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca