Forums before death by AOL, social media and spammers... "We can't have nice things"
|    alt.os.linux.ubuntu    |    I preferred Xubuntu, seemed a bit faster    |    134,474 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 133,948 of 134,474    |
|    Paul to All    |
|    Re: performance patterns not comparable     |
|    05 Jul 24 06:06:32    |
   
   From: nospam@needed.invalid   
      
   On 7/4/2024 1:05 PM, Christian Dürrhauer wrote:   
   > Hi,   
   >   
   > i would like to get a fresh view on my topic.   
   >   
   > The awkward behavior on an Ivy Bridge-based machine (stable operation, no   
   > question) is that a few of the hardware components perform differently across   
   > reboots. And not reliably or after a pattern, at least not one that i was   
   able   
   > to find, but in probably 5 out 15 reboots. That's a digital video recorder   
   and   
   > it certainly does not need to be exchanged.   
   >   
   > There is a NVMe 1.3 SSD in a PCIe 4.0 card in a PCIe 3.0 x4 slot (Samsung 990   
   > Pro).   
   > There is a Realtek 8125B 2,5Gbe network card (PCIe 2.0 x1) in a PCIe 2.0 x1   
   > slot.   
   > Ubuntu 22.04.4 current (kernel 5.15.101 plus Realtek driver package, r8169   
   > driver blacklisted, booting from SATA drive.   
   >   
   > When the issue occurs, SSD delivers 1.9GB/s. Network card delivers 169MB/s.   
   > In normal cases, SSD delivers 3,5GB/s, network card delivers 275MB/s (so the   
   > difference is significant, but still functionally ok).   
   >   
   > Like i said, i fail to see a pattern. System log files are just too huge, but   
   > despite that i tried to compare them and am relatively confident i did not   
   > find anything striking.   
   >   
   > I have swapped power supply, mainboard, SSD, RAM, CPU, keyboard/mouse.   
   Booting   
   > other Ubuntu (clonezilla images) - looks similar.   
   >   
   > Tried googling it but no way finding something, google is too smart and knows   
   > what i was looking for (totally polluted with same search terms but totally   
   > different context).   
   >   
   > Anyone having an idea what is happening here?   
   >   
      
   The numbers suggest improper PCI Express negotiation.   
   275MB/sec is close to the expected 280MB/sec for a 2.5GbE LAN.   
   This means the PCI Express is running at the expected rate,   
   the same case where the SSD gets 3.5GB/sec. If the PCIe on a   
   Realtek is running at half the rate, then the network output   
   will also be "clipped" accordingly. Peripheral cards,   
   for the most part, neatly survive starvation and still function.   
      
   An NVMe can be connected to a processor directly, or, it can use   
   the PCH (Southbridge) x4 interface, which runs at usually one lower   
   standards value than the CPU one. There can be two sled connectors   
   on the motherboard. The one nearest the CPU runs at 2x the speed   
   of the one nearest the Southbridge heatsink.   
      
    CPU --- PCIe Rev4 --- NVMe connector   
    |   
    DMI Rev3 ^   
    | \   
    PCH --- PCIe Rev3 --- NVMe connector <=== PCIe can rate-reduce down   
   to version 1.1 by itself,   
    as part of the startup   
   procedure for it. Some modern   
    video cards have done this,   
   without telling you.   
      
   The Southbridge (PCH) is usually over-subscribed, which means if all   
   the "peripherals" on the Southbridge become busy at the same   
   time, the DMI from CPU to PCH does not have the bandwidth for   
   that. But I don't think that is happening. And that does not   
   affect PHY negotiation in any case. The DMI bus, if it were   
   forced to pathological test case, continues to run, and most   
   of the time the user might not even be aware there is an issue.   
      
   Like the gears on a vehicle, for some reason a PCIe hub is running   
   one standards version too slow. There is probably a way to "jam" this   
   in software. For example, a few video cards had a different videoBIOS   
   added, to force their bus interface to a PCIe 1.1 revision rate,   
   for stability reasons (8800 era). This made the video card, not quite as fast   
   as it could have been, but it also ensured the video card always worked,   
   which is pretty important. No more black screens.   
      
   I don't know if "dmesg" has log entries for PCIe or not. The   
   hardware itself can negotiate for the highest rate. But there   
   should be more than one mechanism for interfering with that.   
   I'm a bit worried this is a BIOS code issue (SMI/SMM runs multiple   
   times a second).   
      
   One thing I have discovered to my horror, is the BIOS is   
   pretty autonomous and not above mischief. My processor was   
   crashing, but this was no ordinary crash. This was not an MCE   
   (Machine Check Error) like on a legacy CPU. Instead, it would   
   appear the BIOS "parked" my processor and turned off both the   
   keyboard +5V and the mouse +5V (PS/2 and USB). None of the   
   USB ports worked. The mains power (measured by a meter which   
   is always present), showed 54W versus idle which is 36W. It's   
   my belief the BIOS did this in an SMI service routine. But, I   
   cannot find any documentation, nor a means of monitoring the   
   BIOS while the OS is running.   
      
   Placing the CPU into another motherboard, the CPU runs normally.   
   The BIOS will eventually "tune something" to the point of ruin,   
   but it might take a long time before one of these "crashes" comes back.   
   And it's not really a crash, it is a kind of Safe Mode for Hardware.   
   There is no documentation. Other people have noted something is   
   wrong with C state control, and switching off C states (CPU runs warmer),   
   also apparently eliminates this BIOS issue.   
      
   When a BIOS ("UEFI") screws around, that destroys the "trust" we had   
   in the Legacy BIOS era. UEFI can be programmed from the OS. UEFI   
   can even agree to flash itself (automatic updates from motherboard   
   maker). There is a huge attack surface for mischief on these   
   newer motherboards. Such a bad bad idea. we have learned nothing   
   apparently, over the years, about defensive design.   
      
    Paul   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca