From: antispam@fricas.org   
      
   Thomas Koenig wrote:   
   > I just looked at the latency / throughput for Zen 5 (the link I   
   > followed is https://docs.amd.com/v/u/en-US/58455_1.00 if anybody   
   > wants to see for themselves), and I found the performance quite   
   > impressive.   
   >   
   > They can execute two 512-bit AVX512 fp adds in parallel (either 64   
   > or 32 bits), plus two 512-bit AFX 512 FMA instructions on top.   
   >   
   > Latency for the floating point add is two (!) cycles, for the FMA   
   > it is four cycles, which is not a lof when running with a boost   
   > frequency 5.7 GHz. The ratio is also interesting, they must   
   > have optimized the floating-point adder quite well.   
   >   
   > Let's see... the peak FP performance with 64-bit reals, with 16   
   > cores (to get an upper limit on FP performance) would be   
   >   
   > 16 cores * (2 * 2 for FMA + 1 * 2 for fadd) * 8 FP numbers * 5.7e9/s   
   > which is approximately 4.3 TFlops per CPU.   
      
   I do not think you can run 16 cores at boost frequency for any   
   reasonable period of time. And all processors that I looked at   
   slowed down clock when AVX FMA was present. And I doubt this   
   "on the top" claim: 2 FMA-s + 2 fadd-s need 10 arguments.   
   If the chip can provide that many arguments in a single cycle   
   this probably can be only for some special combination of   
   sources.   
      
   And note that your mix is 2 multiplies and 4 adds per cycle.   
   Normal FP mix is closer to 50% multiplies.   
      
   --   
    Waldek Hebisch   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|