home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.forth      Forth programmers eat a lot of Bratwurst      117,927 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 117,473 of 117,927   
   peter to Anton Ertl   
   Re: Parsing timestamps?   
   17 Jul 25 22:48:25   
   
   From: peter.noreply@tin.it   
      
   On Thu, 17 Jul 2025 12:54:29 GMT   
   anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:   
      
   > peter  writes:   
   > >Ryzen 9950X   
   > >   
   > >        lxf64   
   > >     5,010,566,495     NAI cycles:u   
   > >     2,011,359,782     UNR cycles:u   
   > >       646,926,001     REC cycles:u   
   > >     3,589,863,082     SR  cycles:u   
   > >   
   > >        lxf64    =20   
   > >     7,019,247,519     NAI instructions:u      =20   
   > >     4,128,689,843     UNR instructions:u       =20   
   > >     4,643,499,656     REC instructions:u=20   
   > >    25,019,182,759     SR  instructions:u=20   
   > >   
   > >   
   > >        gforth-fast 20250219   
   > >     2,048,316,578      NAI cycles:u   
   > >     7,157,520,448      UNR cycles:u   
   > >     3,589,638,677      REC cycles:u   
   > >    17,199,889,916      SR  cycles:u   
   > >   
   > >        gforth-fast 20250219   
   > >    13,107,999,739      NAI instructions:u=20   
   > >     6,789,041,049      UNR instructions:u   
   > >     9,348,969,966      REC instructions:u=20   
   > >    50,108,032,223      SR  instructions:u=20   
   > >   
   > >        lxf   
   > >     6,005,617,374      NAI cycles:u   
   > >     6,004,157,635      UNR cycles:u   
   > >     1,303,627,835      REC cycles:u   
   > >     9,187,422,499      SR  cycles:u   
   > >   
   > >        lxf   
   > >     9,010,888,196      NAI instructions:u   
   > >     4,237,679,129      UNR instructions:u=20   
   > >     4,955,258,040      REC instructions:u=20   
   > >    26,018,680,499      SR  instructions:u   
   >   
   > >lxf uses the x87 builtin fp stack, lxf64 uses sse4 and a large fp stack=20   
   >   
   > Apparently the latency of ADDSD (SSE2) is down to 2 cycles on Zen5   
   > (visible in lxf64 UNR and gforth-fast NAI) while the latency of FADD   
   > (387) is still 6 cycles (lxf NAI and UNR).  I have no explanation why   
   > on lxf64 NAI performs so much worse than UNR, and in gforth-fast UNR   
   > so much worse than NAI.   
   >   
   > For REC the latency should not play a role.  There lxf64 performs at   
   > 7.2IPC and 1.55 F+/cycle, whereas lxf performs only at 3.8IPC and 0.77   
   > F+/cycle.  My guess is that FADD can only be performed by one FPU, and   
   > that's connected to one dispatch port, and other instructions also   
   > need or are at least assigned to this dispatch port.   
   >   
   > - anton   
      
   I did a test coding the sum128 as a code word with avx-512 instructions   
   and got the following results   
      
          285,584,376      cycles:u   
          941,856,077      instructions:u   
      
   timing was   
   timer-reset ' recursive-sum bench .elapsed 51 ms elapsed   
      
   so half the time of the original recursive.   
   with 32 zmm registers I could have done a sum256 also   
      
   the code is below for reference   
   r13 is the fp stack pointer   
   rbx top of stack   
   xmm0 top of fp stack   
      
   code asum128   
      
   movsd [r13-0x8], xmm0   
   lea r13, [r13-0x8]   
      
   vmovapd   zmm0,  [rbx]   
   vmovapd   zmm1,  [rbx+64]   
   vmovapd   zmm2,  [rbx+128]   
   vmovapd   zmm3,  [rbx+192]   
   vmovapd   zmm4,  [rbx+256]   
   vmovapd   zmm5,  [rbx+320]   
   vmovapd   zmm6,  [rbx+384]   
   vmovapd   zmm7,  [rbx+448]   
   vmovapd   zmm8,  [rbx+512]   
   vmovapd   zmm9,  [rbx+576]   
   vmovapd   zmm10,  [rbx+640]   
   vmovapd   zmm11,  [rbx+704]   
   vmovapd   zmm12,  [rbx+768]   
   vmovapd   zmm13,  [rbx+832]   
   vmovapd   zmm14,  [rbx+896]   
   vmovapd   zmm15,  [rbx+960]   
      
   vaddpd  zmm0, zmm0, zmm1   
   vaddpd  zmm2, zmm2, zmm3   
   vaddpd  zmm4, zmm4, zmm5   
   vaddpd  zmm6, zmm6, zmm7   
   vaddpd  zmm8, zmm8, zmm9   
   vaddpd  zmm10, zmm10, zmm11   
   vaddpd  zmm12, zmm12, zmm13   
   vaddpd  zmm14, zmm14, zmm15   
      
   vaddpd  zmm0, zmm0, zmm2   
   vaddpd  zmm4, zmm4, zmm6   
   vaddpd  zmm8, zmm8, zmm10   
   vaddpd  zmm12, zmm12, zmm14   
      
   vaddpd  zmm0, zmm0, zmm4   
   vaddpd  zmm8, zmm8, zmm12   
      
   vaddpd  zmm0, zmm0, zmm8   
      
    Horizontal sum of zmm0   
      
   vextractf64x4 ymm1, zmm0, 1   
   vaddpd ymm2, ymm1, ymm0   
      
   vextractf64x2 xmm3, ymm2, 1   
   vaddpd ymm4, ymm3, ymm2   
      
   vhaddpd xmm0, xmm4, xmm4   
      
   ret   
   end-code   
      
   lxf64 uses a modified fasm as the backend assembler   
   so full support for all instructions   
      
   BR   
   Peter   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca