... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.asm.x86

Ahh, the lost art of x86 assembly

4,675 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 4,478 of 4,675

wolfgang kern to Phu Tran Hoang

Re: Optimize speed 8086 instruction "rep

22 Jul 22 15:37:33

   From: nowhere@nospicedham.nevernet.at   

   On 22/07/2022 07:19, Phu Tran Hoang wrote:   
   > ;Replace "rep movsb" by the following code   
   > test di,1  ; alaign by word   
   > jz $+4   
   > movsb   
   > dec cx   
   >   
   > shr cx,1   
   > rep movsw   
   > jnc $+3   
   > movsb   
   >   
   >   
   >   
   > ;Replace "rep stosb" by the following code   
   > mov ah, al   
   > test di,1  ; alaign by word   
   > jz $+4   
   > stosb   
   > dec cx   
   >   
   > shr cx,1   
   > rep stosw   
   > jnc $+3   
   > stosb   

   [jnc+1 ? stosb/stosw are only one byte code "AA/AB"]   

   Yes, pre- and post-aligning string operations are   
   the main speed-gain in my OS. It works with 32-bit   
   reduction/extension for any odd start and size.   

   But I also align source or destination to quad bounds.   

   TEST esi,3   
   JZ isAligned   
   ...         ;adjust for an aligned loop start here   
   isAligned:   
   SHR ecx,1   ;no action at all if ecx=0   
   JNC +1   
   LODSB   
   SHR ecx,1   
   JNC +2      ;  +2 for use32   
   LODSW       ;  because prefix required here   
   REP LODSD   ;falls through if ECX=Zero   

   and with similar dummy reads up front and at end it   
   can part-read disk sectors at any offset and size.   
   __   
   wolfgang   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]