From: user5857@newsgrouper.org.invalid   
      
   Thomas Koenig posted:   
      
   > I just looked at the latency / throughput for Zen 5 (the link I   
   > followed is https://docs.amd.com/v/u/en-US/58455_1.00 if anybody   
   > wants to see for themselves), and I found the performance quite   
   > impressive.   
   >   
   > They can execute two 512-bit AVX512 fp adds in parallel (either 64   
   > or 32 bits), plus two 512-bit AFX 512 FMA instructions on top.   
   >   
   > Latency for the floating point add is two (!) cycles, for the FMA   
   > it is four cycles, which is not a lof when running with a boost   
   > frequency 5.7 GHz. The ratio is also interesting, they must   
   > have optimized the floating-point adder quite well.   
      
   MIPS R2000 did FADD in 2 cycles: the first thing you have to recognize   
   is that when one operand is more than ±1 in the exponent, that normali-   
   zation is not needed. So, you build 2 FADDs, one specializing in the   
   case where the exponents are with ±1 of each other--which means per-   
   alignment is not needed (>>1, ×1, <<1) and you can start the fraction   
   add immediately, and then use the second cycle for normalization. And   
   you build a second FADD that aligns before Addition but does not need   
   to normalize.   
      
   This is all 1983-stuff.   
      
   MIPS did get FMUL into 3 cycles, but modern wire delay is pushing   
   to 4 cycles. FMUL of 4 cycles is fairly easy target to hit with   
   "throw Verilog over the wall" design style.   
      
   > Let's see... the peak FP performance with 64-bit reals, with 16   
   > cores (to get an upper limit on FP performance) would be   
   >   
   > 16 cores * (2 * 2 for FMA + 1 * 2 for fadd) * 8 FP numbers * 5.7e9/s   
   > which is approximately 4.3 TFlops per CPU.   
   >   
   > An interesting question: When (approximately) did the total   
   > installed floating point performace of all computers worldwide   
   > surpass that of a single 16-core Zen5 CPU? My guess would be   
   > somewhere in the late 1970s/early 1980s, before the PC and the   
   > 8087 took off.   
      
   CoPilot says they sold 600,000 VAX 11/780s and at 1 FLOP×5MHz we   
   get* 3G FLOPs. At this point we are 1300× far away, so it was   
   definitely post RISC generation 1 when there was that much FLOPs   
   world wide.   
      
   (*) a very generous number--probably 4-6× overstating VAX capabilities   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|