home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 242,192 of 243,242   
   Michael S to David Brown   
   Re: _BitInt(N)   
   28 Nov 25 00:15:07   
   
   From: already5chosen@yahoo.com   
      
   On Thu, 27 Nov 2025 21:15:53 +0100   
   David Brown  wrote:   
      
   > On 27/11/2025 15:02, Michael S wrote:   
   > > On Thu, 27 Nov 2025 14:02:38 +0100   
   > > David Brown  wrote:   
   > >   
   >   
   > >   
   > > MSVC compilers compile your code and produce correct result, but the   
   > > code   
   > > looks less nice:   
   > > 0000000000000000 :   
   > >     0:   f2 0f 11 44 24 08       movsd  %xmm0,0x8(%rsp)   
   > >     6:   48 8b 44 24 08          mov    0x8(%rsp),%rax   
   > >     b:   48 c1 e8 34             shr    $0x34,%rax   
   > >     f:   25 ff 07 00 00          and    $0x7ff,%eax   
   > >    14:   c3                      ret   
   > >   
   > > Although on old AMD processors it is likely faster than nicer code   
   > > generated by gcc and clang. On newer processor gcc code is likely a   
   > > bit better, but the difference is unlikely to be detected by simple   
   > > measurements.   
   >   
   > I think it is unlikely that this version - moving from xmm0 to rax   
   > via memory instead of directly - is faster on any processor.  But I   
   > fully agree that it is unlikely to be a measurable difference in   
   > practice.   
      
   I wonder, how do you have a nerve "to think" about things that you have   
   absolutely no idea about?   
      
   Instead of "thinking" you could just as well open Optimization   
   Reference manuals of AMD Bulldozer family or of Bobcat. Or to read   
   Agner Fog's instruction tables. Move from XMM to GPR on these   
   processors is very slow: 8 clocks on BD, 7 on BbC.   
      
   BTW, AMD K8 has the opposite problem. Move from XMM to GPR is reasonably   
   fast, but move from GPR to XMM is painfully slow.   
      
   On the other hand, moves "via memory" are reasonably fast on these   
   CPUs (except, may be, Bobcat? I am not sure about it), because data   
   does not really travels through memory or through cache. Load-store   
   forwarding picks the data directly from the store queue.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca