From: anton@mips.complang.tuwien.ac.at   
      
   dxf writes:   
   >The catch with SSE is there's nothing like FCHS or FABS   
   >so depending on how one implements them, results vary across implementations.   
      
   You can see in Gforth how to implement FNEGATE and FABS with SSE2:   
      
   see fnegate   
   Code fnegate   
    0x000055e6a78a8274: add $0x8,%rbx   
    0x000055e6a78a8278: xorpd 0x24d8f(%rip),%xmm15 # 0x55e6a78cd010   
    0x000055e6a78a8281: mov %r15,%r9   
    0x000055e6a78a8284: mov (%rbx),%rax   
    0x000055e6a78a8287: jmp *%rax   
   end-code   
    ok   
   0x55e6a78cd010 16 dump   
   55E6A78CD010: 00 00 00 00 00 00 00 80 - 00 00 00 00 00 00 00 00   
    ok   
   see fabs   
   Code fabs   
    0x000055e6a78a84fe: add $0x8,%rbx   
    0x000055e6a78a8502: andpd 0x24b15(%rip),%xmm15 # 0x55e6a78cd020   
    0x000055e6a78a850b: mov %r15,%r9   
    0x000055e6a78a850e: mov (%rbx),%rax   
    0x000055e6a78a8511: jmp *%rax   
   end-code   
    ok   
   0x55e6a78cd020 16 dump   
   55E6A78CD020: FF FF FF FF FF FF FF 7F - 00 00 00 00 00 00 00 00   
      
   The actual implementation is the xorpd instruction for FNEGATE, and in   
   the andpd instruction for FABS. The memory locations contain masks:   
   for FNEGATE only the sign bit is set, for FABS everything but the sign   
   bit is set.   
      
   Sure you can implement FNEGATE and FABS in more complicated ways, but   
   you can also implement them in more complicated ways if you use the   
   387 instruction set. Here's an example of more complicated   
   implementations:   
      
   see fnegate   
   FNEGATE   
   ( 004C4010 4833C0 ) XOR RAX, RAX   
   ( 004C4013 F34D0F7EC8 ) MOVQ XMM9, XMM8   
   ( 004C4018 664C0F6EC0 ) MOVQ XMM8, RAX   
   ( 004C401D F2450F5CC1 ) SUBSD XMM8, XMM9   
   ( 004C4022 C3 ) RET/NEXT   
   ( 19 bytes, 5 instructions )   
    ok   
   see fabs   
   FABS   
   ( 004C40B0 E8FBEFFFFF ) CALL 004C30B0 FS@   
   ( 004C40B5 4885DB ) TEST RBX, RBX   
   ( 004C40B8 488B5D00 ) MOV RBX, [RBP]   
   ( 004C40BC 488D6D08 ) LEA RBP, [RBP+08]   
   ( 004C40C0 0F8D05000000 ) JNL/GE 004C40CB   
   ( 004C40C6 E845FFFFFF ) CALL 004C4010 FNEGATE   
   ( 004C40CB C3 ) RET/NEXT   
   ( 28 bytes, 7 instructions )   
      
   - anton   
   --   
   M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
    New standard: https://forth-standard.org/   
   EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/   
   EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|