Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,107 of 4,675    |
|    Terje Mathisen to John    |
|    Re: Easy message box    |
|    28 Nov 17 07:53:22    |
   
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Kerr-Mudd,John wrote:   
   > aen@nospicedham.spamtrap.com wrote in   
   > news:5a1bc894.38348562@NNTP.AIOE.ORG:   
   >   
   >> On Mon, 27 Nov 2017 08:11:37 +0100, Terje Mathisen   
   >>> I.e. instead of trying to divide by 3 or 5, I would use a pair of   
   >>> counters initialized to 2 and 4:   
   >>>   
   >>> int cnt3 = 2, cnt5 = 4, num = 0, sum = 0;   
   >>> do {   
   >>> cnt3--; cnt5--; num++;   
   >>> int mask3 = cnt3 >> 31; // Will be -1 after 3 loops   
   >>> int mask5 = cnt5 >> 31; // Will be -1 after 5 loops   
   >>> cnt3 += mask3 & 3; // Return to 2 if it wrapped around   
   >>> cnt5 += mask5 & 5; // Return to 4 if it wrapped around   
   >>>   
   >>> sum += num & (mask3 | mask5); // Add the current num if divisible   
   >>> } while (num < 1000);   
   >>> ...   
   >   
   > We did this a while back; here's my eventual smallest -   
      
   I did of course remember this one, the main difference was that that   
   challenge printed words instead of adding i the current counter.   
      
   It would be interesting to write an SSE version that was optimized for   
   speed, i.e. an inner loop like this:   
      
   next:   
    psubw xmm0,xmm7 ; Subtract 1 from each 3/5 counter   
    inc edx ; Increment the main counter   
      
    pmovw xmm1,xmm0 ; Save the current counter values   
    psraw xmm0,15 ; 0/-1 masks   
      
    pmovd ecx,xmm0 ; Get the two 16-bit masks   
    pandw xmm0,xmm6 ; Mask in 3 / 5   
      
    mov ebx,ecx   
    shl ecx,16   
    paddw xmm0,xmm1 ; Add in the saved counters   
      
    and ebx,ecx ; Combined mask   
      
    sar ebx,31 ; Extend to 32 bits   
    or ebx,edx ; Masked copy of current num   
      
    add eax,ebx   
      
    dec esi   
    jnz next   
      
      
   Looking at the code I suspect that it could be faster with integer code   
   only...   
      
   It is tempting to use a rotating 15-bit mask to pick the numbers to add,   
   i.e. a 16-bit mask that skips #16.   
      
   Terje   
      
   >   
   > org 0x100 ; FizzBuzz MJ mvY [ 65 ]   
   > cpu 8086 ; variable str lth: only lth cost!   
   > start: ; ax=0 bx=0 cx=00xx dx=cs=ds=es=xxxx si=0100 di=sp=FFFE bp=09xx nz   
   > ; prog assumes ch=0, mem available at 0x3030, nz at start   
   > mov dx,Prtarea ; harded code dword [ 3 ]   
   > eachnum: ; -setup main- [ 5 ]   
   > mov di,dx ; reset to print next   
   > mov si,loadamt ; get to FBtbl @0x2F: init 0x25   
   > nextFB: ; -main- [ 17 ]   
   > lodsw ; get textlth & reset - keep flags!   
   > mov cl,ah ; initial 9(+25) for fallthru   
   > jnz notthisFB ; then skip Textstr to next   
   > iszero: mov byte [si-3],al ; have a FB, reset count   
   > repnz movsb ; cpy Textstr   
   > notthisFB:   
   > add si,cx ; add 0 if moved, or textlth to skip.   
   > eachFB: inc byte [si] ; count up: either FBcnt or TotalCnt   
   > lodsb ; FB:al=num, si->Textstr   
   > jng nextFB ; TC:al=TotalCnt,si->crlf$   
   > isnum: aam ; -cpynum- [ 6 ]   
   > xchg ah,al ; other endian; div costs more   
   > or ax,dx ; 2 digit ascii num; dx=0x3030   
   > cmp dx,di ; -skip num if FB- [ 5 ]   
   > jne goprt   
   > stosw ; with leading Z   
   > goprt:   
   > mov ah,9 ; 9 + off25 = 2E,+1=0x2F [ 11 ]   
   > loadamt equ $-2 ; hard coded offset needed to get FBtbl   
   > movsw ; -cpycrlf$-   
   > movsb ; add eos   
   > cmp al,0x30+10 ; 3A ends max 100 (2 digit)   
   > int 0x21 ; flags preserved!   
   > jl eachnum ; nz from cmp   
   > exit: ret   
   > ;; 0x2F   
   > FBtbl:   
   > ; db -2,-2,4,"Even"   
   > db -3,-3,3,"Fiz" ; variable lth, but for FB - [ 14 ]   
   > db -5,-5,5,"Buzz!" ; cnt,reset,lth,str   
   > ; db -7,-7,3,"Zap"   
   > ; db -11,-11,7,"Pingit!"   
   > TotalCnt db 0,0x0D,0x0A,'$' ; [ 4 ]   
   > proglth equ $-start   
   > Prtarea equ 0x3030 ; hard coded dw total [*65*]   
   >   
      
      
   --   
   -
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca