Just a sample of the Echomail archive
[ << oldest | < older | list | newer > | newest >> ]
|  Message 7622  |
|  Mike Miller to Benny Pedersen  |
|  xeon vs epyc  |
|  06 Sep 22 08:16:41  |
 MSGID: 1:154/30 63174c30 REPLY: 2:230/0 6310a171 TID: SBBSecho 3.15-Linux ensemble/e7dfe4318 Sep 1 2022 GCC 4.8.5 BBSID: ENSEMBLE CHRS: CP850 2 TZUTC: -0500 CHRS: ASCII 1 Hello Benny! 01 Sep 22 12:08, you wrote to all: BP> Linux mx 5.15.63-gentoo-dist #1 SMP Thu Aug 25 12:40:44 -00 2022 BP> x86_64 Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz GenuineIntel BP> GNU/Linux Linux localhost 5.19.6-gentoo-dist #1 SMP PREEMPT_DYNAMIC BP> Wed Aug 31 18:48:13 -00 2022 x86_64 AMD EPYC 7642 48-Core Processor BP> AuthenticAMD GNU/Linux We've been switching over to Epyc boxes at work, and it's been a bit of a nightmare, although mostly that's been due to software limitations. We started out with dual CPU 64-core Epyc CPUs, and ran into limitations with applications that couldn't deal with 256 processors. We had to manually pin each thread/appliaction to a core / set of cores. We eventually switched over to purchasing single-CPU 64 core EPYC boxes, which resolved our issues with CPU pinning for the most part. However, every single EPYC box we're running has to have IOMMU disabled in BIOS. Otherwise, after about 3 months of running the servers will start spewing "AMD-Vi: Completion wait loop timed out" errors. This will cause the pcie devices to rapidly disable/re-enable, which knocks out networking. We've yet to nail down the actual cause of the issue, and it doesn't seem to matter what kernel version we're running. The odd thing is that it will happen in short bursts, groups of servers (usually assigned to the same application) will start blowing up every hour or two, one after another. Not a fun thing when I'm on call, because I swear it starts happening overnight every time. :D Mike ... Dancers do it with rhythm. === GoldED+/LNX 1.1.5-b20220504 --- SBBSecho 3.15-Linux * Origin: War Ensemble - warensemble.com - Appleton, WI (1:154/30) SEEN-BY: 1/19 123 15/0 16/0 19/37 90/1 103/705 105/81 106/201 114/709 SEEN-BY: 120/340 616 123/10 130 131 124/5016 129/305 331 153/757 7001 SEEN-BY: 153/7715 154/0 10 30 40 50 70 700 203/0 218/700 220/90 221/0 SEEN-BY: 221/6 226/18 227/114 201 229/111 112 113 206 275 310 317 SEEN-BY: 229/400 424 426 428 452 470 550 616 664 700 240/77 420 5411 SEEN-BY: 240/5824 5832 5853 266/512 280/464 5003 5006 282/1038 292/854 SEEN-BY: 292/8125 301/1 310/31 317/3 320/119 219 319 322/0 757 326/101 SEEN-BY: 341/66 234 342/200 396/45 423/120 460/58 633/280 712/848 SEEN-BY: 770/1 2320/105 2432/390 2452/250 2454/119 3634/12 5020/545 PATH: 154/30 10 280/464 240/5832 320/219 229/426 |
[ << oldest | < older | list | newer > | newest >> ]