... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,107 of 4,675
Terje Mathisen to John
Re: Easy message box
28 Nov 17 07:53:22
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Kerr-Mudd,John wrote:   
   > aen@nospicedham.spamtrap.com wrote in   
   > news:5a1bc894.38348562@NNTP.AIOE.ORG:   
   >   
   >> On Mon, 27 Nov 2017 08:11:37 +0100, Terje Mathisen   
   >>> I.e. instead of trying to divide by 3 or 5, I would use a pair of   
   >>> counters initialized to 2 and 4:   
   >>>   
   >>>   int cnt3 = 2, cnt5 = 4, num = 0, sum = 0;   
   >>>   do {   
   >>>     cnt3--; cnt5--; num++;   
   >>>     int mask3 = cnt3 >> 31; // Will be -1 after 3 loops   
   >>>     int mask5 = cnt5 >> 31; // Will be -1 after 5 loops   
   >>>     cnt3 += mask3 & 3;         // Return to 2 if it wrapped around   
   >>>     cnt5 += mask5 & 5;      // Return to 4 if it wrapped around   
   >>>   
   >>>     sum += num & (mask3 | mask5); // Add the current num if divisible   
   >>>   } while (num < 1000);   
   >>> ...   
   >   
   > We did this a while back; here's my eventual smallest -   
      
   I did of course remember this one, the main difference was that that   
   challenge printed words instead of adding i the current counter.   
      
   It would be interesting to write an SSE version that was optimized for   
   speed, i.e. an inner loop like this:   
      
   next:   
      psubw xmm0,xmm7	; Subtract 1 from each 3/5 counter   
      inc edx		; Increment the main counter   
      
      pmovw xmm1,xmm0	; Save the current counter values   
      psraw xmm0,15		; 0/-1 masks   
      
      pmovd ecx,xmm0	; Get the two 16-bit masks   
      pandw xmm0,xmm6	; Mask in 3 / 5   
      
      mov ebx,ecx   
      shl ecx,16   
      paddw xmm0,xmm1	; Add in the saved counters   
      
      and ebx,ecx		; Combined mask   
      
      sar ebx,31		; Extend to 32 bits   
      or ebx,edx		; Masked copy of current num   
      
      add eax,ebx   
      
      dec esi   
       jnz next   
      
      
   Looking at the code I suspect that it could be faster with integer code   
   only...   
      
   It is tempting to use a rotating 15-bit mask to pick the numbers to add,   
   i.e. a 16-bit mask that skips #16.   
      
   Terje   
      
   >   
   >           org 0x100          ; FizzBuzz MJ mvY             [ 65 ]   
   >           cpu 8086           ; variable str lth:   only lth cost!   
   > start: ; ax=0 bx=0 cx=00xx dx=cs=ds=es=xxxx si=0100 di=sp=FFFE bp=09xx nz   
   >        ; prog assumes ch=0, mem available at 0x3030, nz at start   
   >           mov dx,Prtarea     ; harded code dword           [  3 ]   
   > eachnum:                     ;  -setup main-               [  5 ]   
   >           mov di,dx          ; reset to print next   
   >           mov si,loadamt     ; get to FBtbl @0x2F: init 0x25   
   > nextFB:                      ; -main-                      [ 17 ]   
   >           lodsw              ; get textlth & reset - keep flags!   
   >           mov cl,ah          ; initial 9(+25) for fallthru   
   >           jnz notthisFB      ; then skip Textstr to next   
   > iszero:   mov byte [si-3],al ; have a FB, reset count   
   >           repnz movsb        ; cpy Textstr   
   > notthisFB:   
   >           add si,cx          ; add 0 if moved, or textlth to skip.   
   > eachFB:   inc byte [si]      ; count up: either FBcnt or TotalCnt   
   >           lodsb              ; FB:al=num, si->Textstr   
   >           jng nextFB         ; TC:al=TotalCnt,si->crlf$   
   > isnum:    aam                ; -cpynum-                    [  6 ]   
   >           xchg ah,al         ; other endian; div costs more   
   >           or  ax,dx          ; 2 digit ascii num; dx=0x3030   
   >           cmp dx,di          ; -skip num if FB-            [  5 ]   
   >           jne goprt   
   >           stosw              ; with leading Z   
   > goprt:   
   >           mov ah,9           ; 9 + off25 = 2E,+1=0x2F      [ 11 ]   
   > loadamt equ $-2              ; hard coded offset needed to get FBtbl   
   >           movsw              ; -cpycrlf$-   
   >           movsb              ; add eos   
   >           cmp al,0x30+10     ; 3A ends max 100 (2 digit)   
   >           int 0x21           ; flags preserved!   
   >           jl eachnum         ; nz from cmp   
   > exit:     ret   
   > ;; 0x2F   
   > FBtbl:   
   > ;          db -2,-2,4,"Even"   
   >           db -3,-3,3,"Fiz"   ; variable lth, but for FB -  [ 14 ]   
   >           db -5,-5,5,"Buzz!"  ; cnt,reset,lth,str   
   > ;          db -7,-7,3,"Zap"   
   > ;          db -11,-11,7,"Pingit!"   
   > TotalCnt  db 0,0x0D,0x0A,'$'   ;                           [  4 ]   
   > proglth   equ $-start   
   > Prtarea   equ 0x3030         ; hard coded dw        total  [*65*]   
   >   
      
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]