home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 2,867 of 4,675   
   Melzzzzz to Terje Mathisen   
   Re: Branching two check increment pays o   
   19 Jul 17 20:45:26   
   
   From: Melzzzzz@nospicedham.zzzzz.com   
      
   On 2017-07-19, Terje Mathisen  wrote:   
   > Melzzzzz wrote:   
   >> I have following code snippet:   
   >>   
   >>     cmp r9,1   
   >>     je .one   
   >>     add rdi,r9   
   >>     sub rdx,r9   
   >>     jmp .L0   
   >> .one:   
   >>     inc rdi   
   >>     dec rdx   
   >>     jmp .L0   
   >>   
   >> For some reason I did it like that and it seems that doesn't hurt, but   
   >> gives significant boost in some cases?   
   >> Am I doing it wrong or is this good practice?   
   >>   
   > That looks _really_ bad!   
   >   
   > You must be hitting some weird interations, maybe with cache line   
   > alignments?   
   >   
   > The ADD and SUB instructions should never take more than a single cycle,   
   > while the branch can miss and INC/DEC could also be slower than ADD/SUB   
   > due to the need to specialcase the carry flag. (But no cpu version I   
   > know about will actually run INC/DEC slower...)   
   >   
   > Terje   
   >   
   Look what measurements say:   
   With branch:   
   cumulative list by time   
      
            strstrsse42   911.6   
                 memmem  1159.3   
            memmemsse42  1352.2   
              memmemopt  1367.3   
                 strstr  1391.5   
             strcasestr  2677.4   
          bmhorspoolasm  2916.1   
             memmemsse2  5390.3   
            BM horspool  5394.2   
             strstrsse2  5601.0   
              strstrasm  6017.4   
                     BM  6774.1   
               BM Turbo  6896.6   
            string find  6975.2   
                memmem2  7036.4   
                    KMP  7503.7   
      
      
   Without branch:   
   cumulative list by time   
      
            strstrsse42   914.5   
                 memmem  1170.7   
            memmemsse42  1357.5   
              memmemopt  1385.4   
                 strstr  1400.7   
             strcasestr  2823.1   
          bmhorspoolasm  5003.1   
             memmemsse2  5343.0   
            BM horspool  5401.2   
             strstrsse2  5569.4   
                     BM  6727.8   
               BM Turbo  6796.7   
            string find  6890.3   
                memmem2  7041.3   
              strstrasm  7123.3   
                    KMP  7499.6   
      
   Clearly, with branch bmhorspoolasm is *much* faster:   
   Here is routine:   
   rdi ptr to haystack,rsi size,rdx ptr to needle, rcx size   
   returns nil or where is found.   
   bmhorspoolasm:   
           xchg rsi,rdx   
           mov r11,rcx   
           mov al,[rsi+rcx-1]   
           mov ah,[rsi]   
           push rbx   
           xor rbx,rbx   
   .L0:   
           cmp rdx,r11   
           jl .fail   
           mov bl,[rdi+rcx-1]   
           cmp ah,[rdi]   
           jne .cont   
           cmp al,bl   
           jne .cont   
           xor r9d,r9d   
   .L1:   
           lddqu xmm0,[rdi+r9]   
           lddqu xmm1,[rsi+r9]   
           pcmpeqb xmm0,xmm1   
           pmovmskb r8d,xmm0   
           sub rcx,16   
           jle .shift   
           xor r8w,0xffff   
           jnz .cont   
           add r9,16   
           jmp .L1   
   .shift:   
           neg rcx   
           mov r10w,0xffff   
           shl r10w,cl   
           shl r8w,cl   
           xor r8w,r10w   
           jz .success   
   .cont:   
           mov rcx,r11   
           mov r9,occ_table   
           mov r9,[r9+rbx*8]   
       cmp r9,1   
       je .one   
           add rdi,r9   
           sub rdx,r9   
           jmp .L0   
   .one:   
       inc rdi   
       dec rdx   
       jmp .L0   
   .fail:   
           pop rbx   
           xor eax,eax   
           ret   
   .success:   
           pop rbx   
           mov rax,rdi   
           ret   
      
      
   --   
   press any key to continue or any other to quit...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca