From: terje.mathisen@nospicedham.tmsw.no   
      
   aen@nospicedham.spamtrap.com wrote:   
   > On Sun, 26 Nov 2017 12:00:33 -0800 (PST), "Rick C. Hodgin"   
   > wrote:   
   >> ...   
   >> My question is: Don't you think things like this are far better   
   >> handled in a higher level language like C, and that assembly should   
   >> be used for only those things where it really matters?   
   >> ...   
   > For me the answer is easy. I love assembly language, and do   
   > everything in it, but YMMV (everyone is free to choose).   
   >   
   I have to side with Rick here, simply because using C makes the actual   
   algorithm easier to visualize, and that is the interesting part here, imho.   
      
   I.e. instead of trying to divide by 3 or 5, I would use a pair of   
   counters initialized to 2 and 4:   
      
    int cnt3 = 2, cnt5 = 4, num = 0, sum = 0;   
    do {   
    cnt3--; cnt5--; num++;   
    int mask3 = cnt3 >> 31; // Will be -1 after 3 loops   
    int mask5 = cnt5 >> 31; // Will be -1 after 5 loops   
    cnt3 += mask3 & 3; // Return to 2 if it wrapped around   
    cnt5 += mask5 & 5; // Return to 4 if it wrapped around   
      
    sum += num & (mask3 | mask5); // Add the current num if divisible   
    } while (num < 1000);   
      
   This algorithm is now (internally) branchless, so it can be easily SIMD   
   vectorized, i.e. storing the 3 and 5 counters and masks in a pair of   
   16/32 or 64-bit MMX/SSE vector slots. The only slightly tricky part is   
   the need to horizontally combine the two mask values in order to add the   
   current number if at least one of them is set, but even without such   
   vector ops the full loop could run in 5-6 cycles on a cpu with 3-way issue.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|