home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   linux.debian.kernel      Debian kernel discussions      2,884 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 2,412 of 2,884   
   DUVERGIER Claude to All   
   Re: Bug#1111027: NMI: IOCK error (debug    
   22 Jan 26 19:40:01   
   
   From: debian.ml@claude.duvergier.fr   
      
    > On Mon, Sep 22 2025 at 19:33, Thorsten Sperber wrote:   
    >>   
    >> thanks for your help. It's been four days now, I'd say above average   
    >> (last was five days) - and no crash yet. I'm going to wait at least   
    >> until the weekend before naming a winner, but that's already looking   
    >> pretty good.   
    >   
    > Thanks for trying.   
    >   
    > I suggested to try intel_idle.max_cstate=2 because these unknown NMI   
    > backtraces all originated from a MWAIT(C3).   
    >   
    > Can you reboot into the working 6.1.y kernel at some point and check   
    > which idle driver is used there?   
    >   
    >     cat /sys/devices/system/cpu/cpuidle/current_driver   
    >   
    > and which states are advertised:   
    >   
    >     ls /sys/devices/system/cpu/cpu0/cpuidle/state   
    >   
    > Thanks,   
    >   
    >         tglx   
      
      
   TL;DR: First time mailing-list user. Having similar issue on   
   Debian-based OS, tested different kernel version and checked the cpuidle   
   values, want to share them.   
      
   Hello,   
      
   I'm having the same issue on a HPE MicroServer Gen8 (with an Intel   
   E3-1220L V2 CPU) since I upgraded to kernel v6.12.15 (from v6.6.44)   
      
   Note: I am using TrueNAS SCALE, which is Debian based but has it's own   
   kernel flavor.   
      
   Applying `intel_iommu=off` and/or `modprobe.blacklist=hpwdt` and/or   
   `intel_idle.max_cstate=x` (where x=0-2) didn't changed anything.   
      
   On the working setup (90+ days of uptime), running TrueNAS v24.10   
   (kernel 6.6.44), I have:   
      
   ```   
   $ cat /proc/cmdline   
   BOOT_IMAGE=/ROOT/24.10.2.4@/boot/vmlinuz-6.6.44-production+truenas   
   root=ZFS=boot-pool/ROOT/24.10.2.4 ro libata.allow_tpm=1 amd_iommu=on   
   iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1   
   nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=0   
      
   $ cat /sys/devices/system/cpu/cpuidle/current_driver   
   acpi_idle   
      
   $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/   
   total 0   
   drwxr-xr-x 2 root root 0 Jan 14 09:32 state0   
   drwxr-xr-x 2 root root 0 Jan 14 09:32 state1   
   drwxr-xr-x 3 root root 0 Jan 14 09:32 state2   
      
   $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name   
   POLL   
   C1   
   C2   
   ```   
      
   On other TrueNAS/kernel versions the server crashes in about 2 days or less.   
      
   On TrueNAS v25.04 (kernel 6.12.15), I have:   
      
   ```   
   $ cat /proc/cmdline   
   BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas   
   root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on   
   iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1   
   nvme_core.multipath=N   
      
   $ cat /sys/devices/system/cpu/cpuidle/current_driver   
   intel_idle   
      
   $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/   
   total 0   
   drwxr-xr-x 2 root root 0 Jan 20 16:43 state0   
   drwxr-xr-x 3 root root 0 Jan 20 16:43 state1   
   drwxr-xr-x 3 root root 0 Jan 20 16:43 state2   
   drwxr-xr-x 3 root root 0 Jan 20 16:43 state3   
   drwxr-xr-x 3 root root 0 Jan 20 16:43 state4   
      
   $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name   
   POLL   
   C1   
   C1E   
   C3   
   C6   
   ```   
      
   On TrueNAS v25.10 (kernel 6.12.33), I have:   
      
   ```   
   $ cat /proc/cmdline   
   BOOT_IMAGE=/ROOT/25.10.1@/boot/vmlinuz-6.12.33-production+truenas   
   root=ZFS=boot-pool/ROOT/25.10.1 ro libata.allow_tpm=1 amd_iommu=on   
   iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1   
   nvme_core.multipath=N   
      
   $ cat /sys/devices/system/cpu/cpuidle/current_driver   
   intel_idle   
      
   $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/   
   total 0   
   drwxr-xr-x 2 root root 0 Jan 20 21:57 state0   
   drwxr-xr-x 3 root root 0 Jan 20 21:57 state1   
   drwxr-xr-x 3 root root 0 Jan 20 21:57 state2   
   drwxr-xr-x 3 root root 0 Jan 20 21:57 state3   
   drwxr-xr-x 3 root root 0 Jan 20 21:57 state4   
      
   $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name   
   POLL   
   C1   
   C1E   
   C3   
   C6   
   ```   
      
   On TrueNAS v25.10 (kernel 6.12.33) with `intel_idle.max_cstate=0`, I have:   
      
      
   ```   
   $ cat /proc/cmdline   
   BOOT_IMAGE=/ROOT/25.10.1@/boot/vmlinuz-6.12.33-production+truenas   
   root=ZFS=boot-pool/ROOT/25.10.1 ro libata.allow_tpm=1 amd_iommu=on   
   iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1   
   nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=0   
      
   $ cat /sys/devices/system/cpu/cpuidle/current_driver   
   acpi_idle   
      
   $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/   
   total 0   
   drwxr-xr-x 2 root root 0 Jan 20 21:47 state0   
   drwxr-xr-x 2 root root 0 Jan 20 21:47 state1   
   drwxr-xr-x 3 root root 0 Jan 20 21:47 state2   
      
   $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name   
   POLL   
   C1   
   C2   
   ```   
      
   I hope those information can help debugging this issue.   
      
      
      
   Here is a partial dmesg from TrueNAS v25.04 (kernel v6.12.15):   
      
   ```   
   $ sudo dmesg   
   [    0.000000] Linux version 6.12.15-production+truenas   
   (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 12.2.0-14) 12.2.0, GNU   
   ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Wed Oct 29   
   14:40:06 UTC 2025   
   [    0.000000] Command line:   
   BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas   
   root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on   
   iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1   
   nvme_core.multipath=N   
   [    0.000000] BIOS-provided physical RAM map:   
   [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable   
   [    0.000000] BIOS-e820: [mem 0x0000000000098400-0x0000000000099bff]   
   reserved   
   [    0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff]   
   reserved   
   [    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff]   
   reserved   
   [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000f1de3fff] usable   
   [    0.000000] BIOS-e820: [mem 0x00000000f1de4000-0x00000000f1dedfff]   
   ACPI data   
   [    0.000000] BIOS-e820: [mem 0x00000000f1dee000-0x00000000f7ffffff]   
   reserved   
   [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fee0ffff]   
   reserved   
   [    0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff]   
   reserved   
   [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000040bffefff] usable   
   [    0.000000] NX (Execute Disable) protection: active   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca