home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,831 of 4,675   
   Terje Mathisen to Robert Prins   
   Re: Use additional (cached) read or add    
   23 Mar 19 10:43:34   
   
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Robert Prins wrote:   
   > Given this code:   
   >   
   >    // if (lift_ptr^._spl   < _spl) or   
   >    //   ((lift_ptr^._spl   = _spl) and   
   >    //    (lift_ptr^.dtv.km > km)) then   
   >   
   >    mov     eax, [ebx + offset lift_list._spl]   
   >    cmp     eax, edx   
   >    jl      @04   
   >    jne     @05   
   >   
   >    mov     eax, [ebx + offset lift_list.dtv.km]   
   >    cmp     eax, ecx   
   >    jle     @05   
   >   
   > @04:   
   >    // _spl:= lift_ptr^._spl;   
   >    // km  := lift_ptr^.dtv.km;   
   >    // lptr:= lift_ptr;   
   >   
   >    mov     edx, [ebx + offset lift_list._spl]   
   >    mov     ecx, [ebx + offset lift_list.dtv.km]   
   >    mov     edi, ebx   
   >   
   > @05:   
      
   The best you can do here is probably to evaluate all three tests in   
   parallel, then AND/OR them together and do a single branch, but this is   
   almost certainly slower than your current code.   
      
   What you could do is to avoid the reloads of ._sp1 and .dtv.km, since   
   you've already moved those values into registers. (Unless you have   
   severe register pressure here?)   
      
   YOu can also make it totally branchless with SETcc for the tests, one   
   AND and one OR to combine them, then CMOVs to save the new best.   
      
   The problem is that such searching for extremal values tend to only need   
   log(n) actual stores (given randomized inputs), so for big n branch   
   prediction will be close to perfect.   
   >   
   > And the fact that the three jumps will be pretty well predicted, I know   
   > my data (yes, that data again), how much would I save by adding   
   > CMOVxx'es before the "jl" and "jle" instructions, to eliminate the   
   > reading of the already cached entries again in @04?   
   >   
   > My guess is not a great deal?   
      
   Nope.   
   :-)   
      
   Terje   
   >   
   > Thanks,   
   >   
   > Robert   
   >   
   > PS:  And yes, upon request I could add some counters to actually tally   
   > the taken/not taken (after taken/not taken?) counts.   
      
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca