From: already5chosen@yahoo.com   
      
   On Tue, 5 Aug 2025 22:17:00 +0200   
   Terje Mathisen wrote:   
      
   > Michael S wrote:   
   > > On Tue, 5 Aug 2025 17:31:34 +0200   
   > > Terje Mathisen wrote:   
   > > In this case 'adc edx,edx' is just slightly shorter encoding   
   > > of 'adc edx,0'. EDX register zeroize few lines above.   
   >   
   > OK, nice.   
      
   BTW, it seems that in your code fragment above you forgot to zeroize EDX   
   at the beginning of iteration. Or am I mssing something?   
      
   > >   
   > >> Anyway, the three main ADD RAX,... operations still define the   
   > >> minimum possible latency, right?   
   > >>   
   > >   
   > > I don't think so.   
   > > It seems to me that there is only one chains of data dependencies   
   > > between iterations of the loop - a trivial dependency through RCX.   
   > > Some modern processors are already capable to eliminate this sort of   
   > > dependency in renamer. Probably not yet when it is coded as 'inc',   
   > > but when coded as 'add' or 'lea'.   
   > >   
   > > The dependency through RDX/RBX does not form a chain. The next value   
   > > of [rdi+rcx*8] does depend on value of rbx from previous iteration,   
   > > but the next value of rbx depends only on [rsi+rcx*8], [r8+rcx*8]   
   > > and [r9+rcx*8]. It does not depend on the previous value of rbx,   
   > > except for control dependency that hopefully would be speculated   
   > > around.   
   >   
   > I believe we are doing a bigint thre-way add, so each result word   
   > depends on the three corresponding input words, plus any carries from   
   > the previous round.   
   >   
   > This is the carry chain that I don't see any obvious way to break...   
   >   
   > Terje   
   >   
   >   
      
   You break the chain by *predicting* that   
   carry[i] = CARRY(a[i]+b[i]+c[i]+carry(i-1) is equal to   
   CARRY(a[i]+b[i]+c[i]). If the prediction turns out wrong then you pay a   
   heavy price of branch misprediction. But outside of specially crafted   
   inputs it is extremely rare.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|