From: dxforth@gmail.com   
      
   On 10/07/2025 6:35 pm, Anton Ertl wrote:   
   > dxf writes:   
   >> The catch with SSE is there's nothing like FCHS or FABS   
   >> so depending on how one implements them, results vary across    
   mplementations.   
   >   
   > You can see in Gforth how to implement FNEGATE and FABS with SSE2:   
   >   
   > see fnegate   
   > Code fnegate   
   > 0x000055e6a78a8274: add $0x8,%rbx   
   > 0x000055e6a78a8278: xorpd 0x24d8f(%rip),%xmm15 # 0x55e6a78cd010   
   > 0x000055e6a78a8281: mov %r15,%r9   
   > 0x000055e6a78a8284: mov (%rbx),%rax   
   > 0x000055e6a78a8287: jmp *%rax   
   > end-code   
   > ok   
   > 0x55e6a78cd010 16 dump   
   > 55E6A78CD010: 00 00 00 00 00 00 00 80 - 00 00 00 00 00 00 00 00   
   > ok   
   > see fabs   
   > Code fabs   
   > 0x000055e6a78a84fe: add $0x8,%rbx   
   > 0x000055e6a78a8502: andpd 0x24b15(%rip),%xmm15 # 0x55e6a78cd020   
   > 0x000055e6a78a850b: mov %r15,%r9   
   > 0x000055e6a78a850e: mov (%rbx),%rax   
   > 0x000055e6a78a8511: jmp *%rax   
   > end-code   
   > ok   
   > 0x55e6a78cd020 16 dump   
   > 55E6A78CD020: FF FF FF FF FF FF FF 7F - 00 00 00 00 00 00 00 00   
   >   
   > The actual implementation is the xorpd instruction for FNEGATE, and in   
   > the andpd instruction for FABS. The memory locations contain masks:   
   > for FNEGATE only the sign bit is set, for FABS everything but the sign   
   > bit is set.   
   >   
   > Sure you can implement FNEGATE and FABS in more complicated ways, but   
   > you can also implement them in more complicated ways if you use the   
   > 387 instruction set. Here's an example of more complicated   
   > implementations:   
   >   
   > see fnegate   
   > FNEGATE   
   > ( 004C4010 4833C0 ) XOR RAX, RAX   
   > ( 004C4013 F34D0F7EC8 ) MOVQ XMM9, XMM8   
   > ( 004C4018 664C0F6EC0 ) MOVQ XMM8, RAX   
   > ( 004C401D F2450F5CC1 ) SUBSD XMM8, XMM9   
   > ( 004C4022 C3 ) RET/NEXT   
   > ( 19 bytes, 5 instructions )   
   > ok   
   > see fabs   
   > FABS   
   > ( 004C40B0 E8FBEFFFFF ) CALL 004C30B0 FS@   
   > ( 004C40B5 4885DB ) TEST RBX, RBX   
   > ( 004C40B8 488B5D00 ) MOV RBX, [RBP]   
   > ( 004C40BC 488D6D08 ) LEA RBP, [RBP+08]   
   > ( 004C40C0 0F8D05000000 ) JNL/GE 004C40CB   
   > ( 004C40C6 E845FFFFFF ) CALL 004C4010 FNEGATE   
   > ( 004C40CB C3 ) RET/NEXT   
   > ( 28 bytes, 7 instructions )   
      
   The latter were basically what was existed in the implementation. As they   
   don't handle -ve zero (or NANs) I swapped them out for the former ones you   
   mention.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|