Forums before death by AOL, social media and spammers... "We can't have nice things"
|    linux.debian.kernel    |    Debian kernel discussions    |    2,884 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 2,412 of 2,884    |
|    DUVERGIER Claude to All    |
|    Re: Bug#1111027: NMI: IOCK error (debug     |
|    22 Jan 26 19:40:01    |
      From: debian.ml@claude.duvergier.fr               > On Mon, Sep 22 2025 at 19:33, Thorsten Sperber wrote:        >>        >> thanks for your help. It's been four days now, I'd say above average        >> (last was five days) - and no crash yet. I'm going to wait at least        >> until the weekend before naming a winner, but that's already looking        >> pretty good.        >        > Thanks for trying.        >        > I suggested to try intel_idle.max_cstate=2 because these unknown NMI        > backtraces all originated from a MWAIT(C3).        >        > Can you reboot into the working 6.1.y kernel at some point and check        > which idle driver is used there?        >        > cat /sys/devices/system/cpu/cpuidle/current_driver        >        > and which states are advertised:        >        > ls /sys/devices/system/cpu/cpu0/cpuidle/state        >        > Thanks,        >        > tglx                     TL;DR: First time mailing-list user. Having similar issue on       Debian-based OS, tested different kernel version and checked the cpuidle       values, want to share them.              Hello,              I'm having the same issue on a HPE MicroServer Gen8 (with an Intel       E3-1220L V2 CPU) since I upgraded to kernel v6.12.15 (from v6.6.44)              Note: I am using TrueNAS SCALE, which is Debian based but has it's own       kernel flavor.              Applying `intel_iommu=off` and/or `modprobe.blacklist=hpwdt` and/or       `intel_idle.max_cstate=x` (where x=0-2) didn't changed anything.              On the working setup (90+ days of uptime), running TrueNAS v24.10       (kernel 6.6.44), I have:              ```       $ cat /proc/cmdline       BOOT_IMAGE=/ROOT/24.10.2.4@/boot/vmlinuz-6.6.44-production+truenas       root=ZFS=boot-pool/ROOT/24.10.2.4 ro libata.allow_tpm=1 amd_iommu=on       iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1       nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=0              $ cat /sys/devices/system/cpu/cpuidle/current_driver       acpi_idle              $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/       total 0       drwxr-xr-x 2 root root 0 Jan 14 09:32 state0       drwxr-xr-x 2 root root 0 Jan 14 09:32 state1       drwxr-xr-x 3 root root 0 Jan 14 09:32 state2              $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name       POLL       C1       C2       ```              On other TrueNAS/kernel versions the server crashes in about 2 days or less.              On TrueNAS v25.04 (kernel 6.12.15), I have:              ```       $ cat /proc/cmdline       BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas       root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on       iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1       nvme_core.multipath=N              $ cat /sys/devices/system/cpu/cpuidle/current_driver       intel_idle              $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/       total 0       drwxr-xr-x 2 root root 0 Jan 20 16:43 state0       drwxr-xr-x 3 root root 0 Jan 20 16:43 state1       drwxr-xr-x 3 root root 0 Jan 20 16:43 state2       drwxr-xr-x 3 root root 0 Jan 20 16:43 state3       drwxr-xr-x 3 root root 0 Jan 20 16:43 state4              $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name       POLL       C1       C1E       C3       C6       ```              On TrueNAS v25.10 (kernel 6.12.33), I have:              ```       $ cat /proc/cmdline       BOOT_IMAGE=/ROOT/25.10.1@/boot/vmlinuz-6.12.33-production+truenas       root=ZFS=boot-pool/ROOT/25.10.1 ro libata.allow_tpm=1 amd_iommu=on       iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1       nvme_core.multipath=N              $ cat /sys/devices/system/cpu/cpuidle/current_driver       intel_idle              $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/       total 0       drwxr-xr-x 2 root root 0 Jan 20 21:57 state0       drwxr-xr-x 3 root root 0 Jan 20 21:57 state1       drwxr-xr-x 3 root root 0 Jan 20 21:57 state2       drwxr-xr-x 3 root root 0 Jan 20 21:57 state3       drwxr-xr-x 3 root root 0 Jan 20 21:57 state4              $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name       POLL       C1       C1E       C3       C6       ```              On TrueNAS v25.10 (kernel 6.12.33) with `intel_idle.max_cstate=0`, I have:                     ```       $ cat /proc/cmdline       BOOT_IMAGE=/ROOT/25.10.1@/boot/vmlinuz-6.12.33-production+truenas       root=ZFS=boot-pool/ROOT/25.10.1 ro libata.allow_tpm=1 amd_iommu=on       iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1       nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=0              $ cat /sys/devices/system/cpu/cpuidle/current_driver       acpi_idle              $ ls -l /sys/devices/system/cpu/cpu0/cpuidle/       total 0       drwxr-xr-x 2 root root 0 Jan 20 21:47 state0       drwxr-xr-x 2 root root 0 Jan 20 21:47 state1       drwxr-xr-x 3 root root 0 Jan 20 21:47 state2              $ cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name       POLL       C1       C2       ```              I hope those information can help debugging this issue.                            Here is a partial dmesg from TrueNAS v25.04 (kernel v6.12.15):              ```       $ sudo dmesg       [ 0.000000] Linux version 6.12.15-production+truenas       (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 12.2.0-14) 12.2.0, GNU       ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Wed Oct 29       14:40:06 UTC 2025       [ 0.000000] Command line:       BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas       root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on       iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1       nvme_core.multipath=N       [ 0.000000] BIOS-provided physical RAM map:       [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable       [ 0.000000] BIOS-e820: [mem 0x0000000000098400-0x0000000000099bff]       reserved       [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff]       reserved       [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff]       reserved       [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000f1de3fff] usable       [ 0.000000] BIOS-e820: [mem 0x00000000f1de4000-0x00000000f1dedfff]       ACPI data       [ 0.000000] BIOS-e820: [mem 0x00000000f1dee000-0x00000000f7ffffff]       reserved       [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fee0ffff]       reserved       [ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff]       reserved       [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000040bffefff] usable       [ 0.000000] NX (Execute Disable) protection: active              [continued in next message]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca