... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 2,863 of 4,675
Terje Mathisen to James Van Buskirk
Re: Process at read-in time, or use post
19 Jul 17 22:17:27
   From: terje.mathisen@nospicedham.tmsw.no   
      
   James Van Buskirk wrote:   
   >> "Terje Mathisen"  wrote in message   
   >> You can of course use binary search to reduce this to max 6   
   >> compares, followed by a block copy to shift the bottom part down to   
   >> make roon in the right spot.   
   >   
   >> This key is that 40 is such a small number that on average you   
   >> will need very few comparisons for the naive code.   
   >   
   > I consider it unfortunate that you snipped Robert Wessel's advice to   
   > implement the top 40 list as a priority queue via a heap since the   
      
   Sorry, I should have discussed this directly, it _is_ a very good idea   
   in general.   
      
   > O.P. stated above that he already has working heapsort code, so it   
   > should be a breeze to implement a heap (or actually running 3 heaps   
   > at the same time in the context of this problem so as to only read   
   > each data item once).   
   >   
   > I know you know this, but am not sure why you didn't comment on the   
   > possibility.   
      
   OK, let's go through it:   
      
   A priority queue is perfect if you have to be prepared to spit out an   
   ordered list any any point in time, it carries a cost of O(log(n)) for   
   both insertions and deletions (with n = 40 here, so the log(n) term will   
   be ~6) for most operations.   
      
   Unless you also maintain a separate #40 marker, i.e. the current cutoff   
   point, you will carry that log(40) cost for all new records, _including_   
   all those that fall outside the top 40 range!   
      
   I.e. the naive version will blindly insert the new record and then   
   delete the current #41 to bring the total back to 40.   
      
   If the total count is large (i.e. >= 400) at least 90% of all new   
   records will be discarded immediately at O(1) cost as long as you have   
   that cutoff point available. At that point the difference between a   
   O(log(n)) and a O(n) algorithm is a fixed number (6/(40*0.5) =~ 3).   
      
   A sorted list of the 40 best, with either per-comparison copy or a final   
   REP MOVS + insertion will probably perform very close to the log(n)   
   priority queue since the constant factor is just 3 and the complexity of   
   the code is much lower, and probably less branch prediction failures.)   
      
   Anyway, these numbers are  bit too close to call, the actual performance   
   winner will depend a _lot_ on the pattern of the incoming records! If   
   they are in effectively random order then you can do a sort of   
   branchless version like this to determine the insertion point:   
      
   mov eax,[curr.key]   
   cmp eax,data[0].key	;; Record #40 stored first   
     jbe done		;; Discard at once   
      
   ; This record should be inserted!   
   mov esi, 0   
   mov edi, 39   
   mov ebx, 19   
      
   rept 6   
      cmp eax,data[ebx*RECORDSIZE].key   
      cmova esi,ebx   
      cmovbe edi,ebx   
      lea ebx,[esi+edi]   
      shr ebx,1   
   endm   
      
   The code above should take 6 cycles/iteration, so 36 in total. This is   
   the same as 2-4 branch misses so it might well beat code that uses   
   branching instead.   
      
   With a linear search the loop becomes much smaller:   
      
   next:   
      cmp eax,data[ebx*RECORDSIZE].key   
      lea ebx,[ebx+1]   
       ja next   
      
   which can run in one or two cycles/iteration for an average count of 20.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]