Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,831 of 4,675    |
|    Terje Mathisen to Robert Prins    |
|    Re: Use additional (cached) read or add     |
|    23 Mar 19 10:43:34    |
      From: terje.mathisen@nospicedham.tmsw.no              Robert Prins wrote:       > Given this code:       >       > // if (lift_ptr^._spl < _spl) or       > // ((lift_ptr^._spl = _spl) and       > // (lift_ptr^.dtv.km > km)) then       >       > mov eax, [ebx + offset lift_list._spl]       > cmp eax, edx       > jl @04       > jne @05       >       > mov eax, [ebx + offset lift_list.dtv.km]       > cmp eax, ecx       > jle @05       >       > @04:       > // _spl:= lift_ptr^._spl;       > // km := lift_ptr^.dtv.km;       > // lptr:= lift_ptr;       >       > mov edx, [ebx + offset lift_list._spl]       > mov ecx, [ebx + offset lift_list.dtv.km]       > mov edi, ebx       >       > @05:              The best you can do here is probably to evaluate all three tests in       parallel, then AND/OR them together and do a single branch, but this is       almost certainly slower than your current code.              What you could do is to avoid the reloads of ._sp1 and .dtv.km, since       you've already moved those values into registers. (Unless you have       severe register pressure here?)              YOu can also make it totally branchless with SETcc for the tests, one       AND and one OR to combine them, then CMOVs to save the new best.              The problem is that such searching for extremal values tend to only need       log(n) actual stores (given randomized inputs), so for big n branch       prediction will be close to perfect.       >       > And the fact that the three jumps will be pretty well predicted, I know       > my data (yes, that data again), how much would I save by adding       > CMOVxx'es before the "jl" and "jle" instructions, to eliminate the       > reading of the already cached entries again in @04?       >       > My guess is not a great deal?              Nope.       :-)              Terje       >       > Thanks,       >       > Robert       >       > PS: And yes, upon request I could add some counters to actually tally       > the taken/not taken (after taken/not taken?) counts.                     --       - |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca