Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 2,863 of 4,675    |
|    Terje Mathisen to James Van Buskirk    |
|    Re: Process at read-in time, or use post    |
|    19 Jul 17 22:17:27    |
      From: terje.mathisen@nospicedham.tmsw.no              James Van Buskirk wrote:       >> "Terje Mathisen" wrote in message       >> You can of course use binary search to reduce this to max 6       >> compares, followed by a block copy to shift the bottom part down to       >> make roon in the right spot.       >       >> This key is that 40 is such a small number that on average you       >> will need very few comparisons for the naive code.       >       > I consider it unfortunate that you snipped Robert Wessel's advice to       > implement the top 40 list as a priority queue via a heap since the              Sorry, I should have discussed this directly, it _is_ a very good idea       in general.              > O.P. stated above that he already has working heapsort code, so it       > should be a breeze to implement a heap (or actually running 3 heaps       > at the same time in the context of this problem so as to only read       > each data item once).       >       > I know you know this, but am not sure why you didn't comment on the       > possibility.              OK, let's go through it:              A priority queue is perfect if you have to be prepared to spit out an       ordered list any any point in time, it carries a cost of O(log(n)) for       both insertions and deletions (with n = 40 here, so the log(n) term will       be ~6) for most operations.              Unless you also maintain a separate #40 marker, i.e. the current cutoff       point, you will carry that log(40) cost for all new records, _including_       all those that fall outside the top 40 range!              I.e. the naive version will blindly insert the new record and then       delete the current #41 to bring the total back to 40.              If the total count is large (i.e. >= 400) at least 90% of all new       records will be discarded immediately at O(1) cost as long as you have       that cutoff point available. At that point the difference between a       O(log(n)) and a O(n) algorithm is a fixed number (6/(40*0.5) =~ 3).              A sorted list of the 40 best, with either per-comparison copy or a final       REP MOVS + insertion will probably perform very close to the log(n)       priority queue since the constant factor is just 3 and the complexity of       the code is much lower, and probably less branch prediction failures.)              Anyway, these numbers are bit too close to call, the actual performance       winner will depend a _lot_ on the pattern of the incoming records! If       they are in effectively random order then you can do a sort of       branchless version like this to determine the insertion point:              mov eax,[curr.key]       cmp eax,data[0].key ;; Record #40 stored first        jbe done ;; Discard at once              ; This record should be inserted!       mov esi, 0       mov edi, 39       mov ebx, 19              rept 6        cmp eax,data[ebx*RECORDSIZE].key        cmova esi,ebx        cmovbe edi,ebx        lea ebx,[esi+edi]        shr ebx,1       endm              The code above should take 6 cycles/iteration, so 36 in total. This is       the same as 2-4 branch misses so it might well beat code that uses       branching instead.              With a linear search the loop becomes much smaller:              next:        cmp eax,data[ebx*RECORDSIZE].key        lea ebx,[ebx+1]        ja next              which can run in one or two cycles/iteration for an average count of 20.              Terje              --       - |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca