home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 4,478 of 4,675   
   wolfgang kern to Phu Tran Hoang   
   Re: Optimize speed 8086 instruction "rep   
   22 Jul 22 15:37:33   
   
   From: nowhere@nospicedham.nevernet.at   
      
   On 22/07/2022 07:19, Phu Tran Hoang wrote:   
   > ;Replace "rep movsb" by the following code   
   > test di,1  ; alaign by word   
   > jz $+4   
   > movsb   
   > dec cx   
   >   
   > shr cx,1   
   > rep movsw   
   > jnc $+3   
   > movsb   
   >   
   >   
   >   
   > ;Replace "rep stosb" by the following code   
   > mov ah, al   
   > test di,1  ; alaign by word   
   > jz $+4   
   > stosb   
   > dec cx   
   >   
   > shr cx,1   
   > rep stosw   
   > jnc $+3   
   > stosb   
      
   [jnc+1 ? stosb/stosw are only one byte code "AA/AB"]   
      
   Yes, pre- and post-aligning string operations are   
   the main speed-gain in my OS. It works with 32-bit   
   reduction/extension for any odd start and size.   
      
   But I also align source or destination to quad bounds.   
      
   TEST esi,3   
   JZ isAligned   
   ...         ;adjust for an aligned loop start here   
   isAligned:   
   SHR ecx,1   ;no action at all if ecx=0   
   JNC +1   
   LODSB   
   SHR ecx,1   
   JNC +2      ;  +2 for use32   
   LODSW       ;  because prefix required here   
   REP LODSD   ;falls through if ECX=Zero   
      
   and with similar dummy reads up front and at end it   
   can part-read disk sectors at any offset and size.   
   __   
   wolfgang   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca