Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 4,478 of 4,675    |
|    wolfgang kern to Phu Tran Hoang    |
|    Re: Optimize speed 8086 instruction "rep    |
|    22 Jul 22 15:37:33    |
      From: nowhere@nospicedham.nevernet.at              On 22/07/2022 07:19, Phu Tran Hoang wrote:       > ;Replace "rep movsb" by the following code       > test di,1 ; alaign by word       > jz $+4       > movsb       > dec cx       >       > shr cx,1       > rep movsw       > jnc $+3       > movsb       >       >       >       > ;Replace "rep stosb" by the following code       > mov ah, al       > test di,1 ; alaign by word       > jz $+4       > stosb       > dec cx       >       > shr cx,1       > rep stosw       > jnc $+3       > stosb              [jnc+1 ? stosb/stosw are only one byte code "AA/AB"]              Yes, pre- and post-aligning string operations are       the main speed-gain in my OS. It works with 32-bit       reduction/extension for any odd start and size.              But I also align source or destination to quad bounds.              TEST esi,3       JZ isAligned       ... ;adjust for an aligned loop start here       isAligned:       SHR ecx,1 ;no action at all if ecx=0       JNC +1       LODSB       SHR ecx,1       JNC +2 ; +2 for use32       LODSW ; because prefix required here       REP LODSD ;falls through if ECX=Zero              and with similar dummy reads up front and at end it       can part-read disk sectors at any offset and size.       __       wolfgang              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca