home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,828 of 4,675   
   Robert Prins to Robert Wessel   
   Re: Use additional (cached) read or add    
   22 Mar 19 23:15:11   
   
   From: robert@nospicedham.prino.org   
      
   On 2019-03-22 21:24, Robert Wessel wrote:   
   > On Fri, 22 Mar 2019 18:58:55 +0000, Robert Prins   
   >  wrote:   
   >   
   >> Given this code:   
   >>   
   >>    // if (lift_ptr^._spl   < _spl) or   
   >>    //   ((lift_ptr^._spl   = _spl) and   
   >>    //    (lift_ptr^.dtv.km > km)) then   
   >>   
   >>    mov     eax, [ebx + offset lift_list._spl]   
   >>    cmp     eax, edx   
   >>    jl      @04   
   >>    jne     @05   
   >>   
   >>    mov     eax, [ebx + offset lift_list.dtv.km]   
   >>    cmp     eax, ecx   
   >>    jle     @05   
   >>   
   >> @04:   
   >>    // _spl:= lift_ptr^._spl;   
   >>    // km  := lift_ptr^.dtv.km;   
   >>    // lptr:= lift_ptr;   
   >>   
   >>    mov     edx, [ebx + offset lift_list._spl]   
   >>    mov     ecx, [ebx + offset lift_list.dtv.km]   
   >>    mov     edi, ebx   
   >>   
   >> @05:   
   >>   
   >> And the fact that the three jumps will be pretty well predicted, I know my   
   data   
   >> (yes, that data again), how much would I save by adding CMOVxx'es before the   
   >> "jl" and "jle" instructions, to eliminate the reading of the already cached   
   >> entries again in @04?   
   >>   
   >> My guess is not a great deal?   
   >>   
   >> Thanks,   
   >>   
   >> Robert   
   >>   
   >> PS:  And yes, upon request I could add some counters to actually tally the   
   >> taken/not taken (after taken/not taken?) counts.   
   >   
   > CMOVs are usually not that fast.   
      
   According to Agner Fog, they are now one-cycle on current AMD and Intel CPUs.   
      
   > If the condition is well predicted, they'll be much slower than the   
   > conditional jumps.  Why not just load the two values into two different   
   > registers, instead of both to eax. If you get to @04, either move them from   
   > there to edx/ecx, or use them directly from the revised locations.   
      
   If I had more available free registers, I would have done so, but in 32-bit   
   code   
   you only really have six of them. :(   
      
   Robert   
   --   
   Robert AH Prins   
   robert(a)prino(d)org   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca