home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,095 of 4,675   
   Terje Mathisen to aen@nospicedham.spamtrap.com   
   Re: Easy message box   
   27 Nov 17 08:11:37   
   
   From: terje.mathisen@nospicedham.tmsw.no   
      
   aen@nospicedham.spamtrap.com wrote:   
   > On Sun, 26 Nov 2017 12:00:33 -0800 (PST), "Rick C. Hodgin"   
   >  wrote:   
   >> ...   
   >> My question is:  Don't you think things like this are far better   
   >> handled in a higher level language like C, and that assembly should   
   >> be used for only those things where it really matters?   
   >> ...   
   > For me the answer is easy.  I love assembly language, and do   
   > everything in it, but YMMV (everyone is free to choose).   
   >   
   I have to side with Rick here, simply because using C makes the actual   
   algorithm easier to visualize, and that is the interesting part here, imho.   
      
   I.e. instead of trying to divide by 3 or 5, I would use a pair of   
   counters initialized to 2 and 4:   
      
      int cnt3 = 2, cnt5 = 4, num = 0, sum = 0;   
      do {   
        cnt3--; cnt5--; num++;   
        int mask3 = cnt3 >> 31; // Will be -1 after 3 loops   
        int mask5 = cnt5 >> 31; // Will be -1 after 5 loops   
        cnt3 += mask3 & 3;	    // Return to 2 if it wrapped around   
        cnt5 += mask5 & 5;      // Return to 4 if it wrapped around   
      
        sum += num & (mask3 | mask5); // Add the current num if divisible   
      } while (num < 1000);   
      
   This algorithm is now (internally) branchless, so it can be easily SIMD   
   vectorized, i.e. storing the 3 and 5 counters and masks in a pair of   
   16/32 or 64-bit MMX/SSE vector slots. The only slightly tricky part is   
   the need to horizontally combine the two mask values in order to add the   
   current number if at least one of them is set, but even without such   
   vector ops the full loop could run in 5-6 cycles on a cpu with 3-way issue.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca