From: terje.mathisen@nospicedham.tmsw.no   
      
   aen@nospicedham.spamtrap.com wrote:   
   > On Mon, 18 Feb 2019 01:41:25 -0600, Robert Wessel   
   > wrote:   
   >>   
   >> My question is, what exactly is the point? It's trivial, but you're   
   >> stuck needing 512MB of disk and address space, plus the I/O for that   
   >> every time the program loads.   
   >>   
   > I think there isn't much I/O going on at the start. The way I   
   > understand it, the mapping function only stores the address and length   
   > of the bitarray and only loads a page when there is an access to that   
   > memory and the address space used is only that page.   
   >   
   >> I'd call that two instructions, but that's just me.   
   >>   
   > the jnc or jc is a reaction to the test.   
   >   
   >> If you don't mind making the test four instructions, you could store   
   >> only a bitmap of the odd numbers.   
   >>   
   > I thought of that, and will try it some later time.   
   >   
   >> Considering the cost of a page I/O, even to SSD, you might well be   
   >> better off just testing against the 6542 primes (which, if my   
   >> calculations are correct, is the number of primes below sqrt(2**32)).   
   >> That would only take a 13kb table to store. And if there's a fair   
   >> fraction of non-primes being tested, that will usually fail after only   
   >> the first few tests anyway.   
   >>   
   > I did some timings, and a run through all 4G numbers took ~10 seconds.   
   >   
   >   
   > The test was:   
   > mov rcx,4294967295   
   > L0: bt [rbx],rcx   
   > dec rcx   
   > jnz L0   
   >   
   > A sequential run through all the numbers is IMO the worst case test,   
   > because all pages need to be loaded.   
      
   Not at all! You are not using the results of the BT operation in order   
   to branch or do anything else, i.e. it is discarded at once.   
      
   >   
   > In cycles one test takes ~8 cycles, which seems not bad to me, and the   
      
   8 cycles for a perfectly predicted loop like this means that the BT   
   itself is taking this long.   
      
   Please try two more tests:   
      
   a) Starting at 2 and incrementing until wraparound, i.e. the same JNZ L0   
   as the last instruction. Count the number of primes found like this:   
      
    xor eax,eax   
    mov ecx,2   
   L0: bt [rbx],rcx   
    adc eax,0   
    inc rcx   
    jnz L0   
   ; EAX has prime count for verification   
      
   b) Do the same with the compressed 2:1 (odd) table   
      
    mov eax,1 ; Counting 2 as a prime   
    mov ecx,1 ; 3 SHR 1   
   L0: bt [rbx],rcx   
    adc eax,0   
    inc rcx   
    jns L0 ; Stops at 2^31   
      
      
   > test for even numbers could of course be eliminated with a test rcx,1   
   > instead of a prime test.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|