home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,729 of 131,241   
   Thomas Koenig to Waldek Hebisch   
   Re: Zen 5 FP latencies / throughput   
   20 Sep 25 13:32:34   
   
   From: tkoenig@netcologne.de   
      
   Waldek Hebisch  schrieb:   
   > Thomas Koenig  wrote:   
   >> I just looked at the latency / throughput for Zen 5 (the link I   
   >> followed is https://docs.amd.com/v/u/en-US/58455_1.00 if anybody   
   >> wants to see for themselves), and I found the performance quite   
   >> impressive.   
   >>   
   >> They can execute two 512-bit AVX512 fp adds in parallel (either 64   
   >> or 32 bits), plus two 512-bit AFX 512 FMA instructions on top.   
   >>   
   >> Latency for the floating point add is two (!) cycles, for the FMA   
   >> it is four cycles, which is not a lof when running with a boost   
   >> frequency 5.7 GHz.  The ratio is also interesting, they must   
   >> have optimized the floating-point adder quite well.   
   >>   
   >> Let's see... the peak FP performance with 64-bit reals, with 16   
   >> cores (to get an upper limit on FP performance) would be   
   >>   
   >> 16 cores * (2 * 2 for FMA + 1 * 2 for fadd) * 8 FP numbers * 5.7e9/s   
   >> which is approximately 4.3 TFlops per CPU.   
   >   
   > I do not think you can run 16 cores at boost frequency for any   
   > reasonable period of time.  And all processors that I looked at   
   > slowed down clock when AVX FMA was present.   
      
   It slows down somewhat, but the behavior is still impressive.   
   If you want to know the details, an analysis is at   
   https://chipsandcheese.com/p/zen-5s-avx-512-frequency-behavior .   
   Unfortunately, they didn't run two FMA + two adds, but only   
   two FMA + one add in parralel.   
      
   > And I doubt this   
   > "on the top" claim: 2 FMA-s + 2 fadd-s need 10 arguments.   
   > If the chip can provide that many arguments in a single cycle   
   > this probably can be only for some special combination of   
   > sources.   
      
   Register to register   
   >   
   > And note that your mix is 2 multiplies and 4 adds per cycle.   
   > Normal FP mix is closer to 50% multiplies.   
      
   I wrote about "peak performance", which is the speed where there   
   it is guaranteed that it cannot be exceeded :-)  It's like the   
   160 MFlops for the Cray-I, which people also could not realistically   
   achieve.   
   --   
   This USENET posting was made without artificial intelligence,   
   artificial impertinence, artificial arrogance, artificial stupidity,   
   artificial flavorings or artificial colorants.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca