home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.forth      Forth programmers eat a lot of Bratwurst      117,927 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 117,480 of 117,927   
   Anton Ertl to peter   
   Re: Vector sum (was: Parsing timestamps?   
   19 Jul 25 14:39:42   
   
   From: anton@mips.complang.tuwien.ac.at   
      
   peter  writes:   
   >On Sat, 19 Jul 2025 10:18:15 GMT   
   >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:   
   [sum32][   
   >> vmovapd   zmm0,  [rbx]   
   >> vaddpd  zmm0, zmm0, [rbx+64]   
   >> vmovapd   zmm1,  [rbx+128]   
   >> vaddpd  zmm1, zmm1, [rbx+192]   
   >> vaddpd  zmm0, zmm0, zmm1   
   >> ; and then the Horizontal sum   
   >>   
   >> >; Horizontal sum of zmm0   
   >> >   
   >> >vextractf64x4 ymm1, zmm0, 1   
   >> >vaddpd ymm2, ymm1, ymm0   
   >> >   
   >> >vextractf64x2 xmm3, ymm2, 1   
   >> >vaddpd ymm4, ymm3, ymm2   
   >> >   
   >> >vhaddpd xmm0, xmm4, xmm4   
   >   
   >the simd instructions does also take a memory operand   
   >I can du sum128 as   
   >   
   >code asum128b   
   >   
   >movsd [r13-0x8], xmm0   
   >lea r13, [r13-0x8]   
   >   
   >vmovapd zmm0,  [rbx]   
   >vaddpd  zmm0, zmm0,  [rbx+64]   
   >vaddpd  zmm0, zmm0,  [rbx+128]   
   >vaddpd  zmm0, zmm0,  [rbx+192]   
   >vaddpd  zmm0, zmm0,  [rbx+256]   
   >vaddpd  zmm0, zmm0,  [rbx+320]   
   >vaddpd  zmm0, zmm0,  [rbx+384]   
   >vaddpd  zmm0, zmm0,  [rbx+448]   
   >vaddpd  zmm0, zmm0,  [rbx+512]   
   >vaddpd  zmm0, zmm0,  [rbx+576]   
   >vaddpd  zmm0, zmm0,  [rbx+640]   
   >vaddpd  zmm0, zmm0,  [rbx+704]   
   >vaddpd  zmm0, zmm0,  [rbx+768]   
   >vaddpd  zmm0, zmm0,  [rbx+832]   
   >vaddpd  zmm0, zmm0,  [rbx+896]   
   >vaddpd  zmm0, zmm0,  [rbx+960]   
      
   Yes, but that's not pairwise addition, so for these 16 adds you get   
   worse avarage accuracy; if the CPU has limited OoO bufferering (maybe   
   one of the Xeon Phis, but not anything modern that has AVX or   
   AVX-512), you may also see some of the addition latency.  You still   
   get pairwise addition and its accuracy benefit for the horizontal sum   
   and the recursive parts.   
      
   - anton   
   --   
   M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
        New standard: https://forth-standard.org/   
   EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca