From: terje.mathisen@tmsw.no   
      
   Michael S wrote:   
   > On Tue, 5 Aug 2025 22:17:00 +0200   
   > Terje Mathisen wrote:   
   >   
   >> Michael S wrote:   
   >>> On Tue, 5 Aug 2025 17:31:34 +0200   
   >>> Terje Mathisen wrote:   
   >>> In this case 'adc edx,edx' is just slightly shorter encoding   
   >>> of 'adc edx,0'. EDX register zeroize few lines above.   
   >>   
   >> OK, nice.   
   >   
   > BTW, it seems that in your code fragment above you forgot to zeroize EDX   
   > at the beginning of iteration. Or am I mssing something?   
      
   No, you are not. I skipped pretty much all the setup code. :-)   
   >   
   >>>   
   >>>> Anyway, the three main ADD RAX,... operations still define the   
   >>>> minimum possible latency, right?   
   >>>>   
   >>>   
   >>> I don't think so.   
   >>> It seems to me that there is only one chains of data dependencies   
   >>> between iterations of the loop - a trivial dependency through RCX.   
   >>> Some modern processors are already capable to eliminate this sort of   
   >>> dependency in renamer. Probably not yet when it is coded as 'inc',   
   >>> but when coded as 'add' or 'lea'.   
   >>>   
   >>> The dependency through RDX/RBX does not form a chain. The next value   
   >>> of [rdi+rcx*8] does depend on value of rbx from previous iteration,   
   >>> but the next value of rbx depends only on [rsi+rcx*8], [r8+rcx*8]   
   >>> and [r9+rcx*8]. It does not depend on the previous value of rbx,   
   >>> except for control dependency that hopefully would be speculated   
   >>> around.   
   >>   
   >> I believe we are doing a bigint thre-way add, so each result word   
   >> depends on the three corresponding input words, plus any carries from   
   >> the previous round.   
   >>   
   >> This is the carry chain that I don't see any obvious way to break...   
   >>   
   >   
   > You break the chain by *predicting* that   
   > carry[i] = CARRY(a[i]+b[i]+c[i]+carry(i-1) is equal to   
   > CARRY(a[i]+b[i]+c[i]). If the prediction turns out wrong then you pay a   
   > heavy price of branch misprediction. But outside of specially crafted   
   > inputs it is extremely rare.   
      
   Aha!   
      
   That's _very_ nice.   
      
   Terje   
      
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|