... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.asm.x86

Ahh, the lost art of x86 assembly

4,675 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 2,730 of 4,675

James Harris to Rod Pemberton

Re: Optimize stricmp() algorithm (casele

27 Jun 17 09:51:06

   From: james.harris.1@nospicedham.gmail.com   
      
   On 27/06/2017 01:14, Rod Pemberton wrote:   
   > On Mon, 26 Jun 2017 01:45:25 -0700 (PDT)   
   > "Rick C. Hodgin"  wrote:   
   >   
   >> Oh my ... AT&T syntax.   
   >   
   > With -O2, 32-bit GCC (DJGPP for DOS) generates:   
   >   
   > .globl _rp_stricmp   
   > _rp_stricmp:   
   > 	push	ebp   
   > 	mov	ecx, 1   
   > 	mov	ebp, esp   
   > 	push	esi   
   > 	push	ebx   
   > 	mov	edx, DWORD PTR [ebp+12]   
   > 	mov	ebx, DWORD PTR [ebp+8]   
   > L10:   
   > 	test	ecx, ecx   
   > 	je	L3   
   > 	movsx	ecx, BYTE PTR [ebx]   
   > 	movsx	esi, BYTE PTR [edx]   
   > 	inc	ebx   
   > 	inc	edx   
   > 	cmp	ecx, esi   
   > 	je	L10   
   > 	mov	al, BYTE PTR _lower[esi]   
   > 	cmp	BYTE PTR _lower[ecx], al   
      
   It's interesting that for lower() DJGPP uses a lookup table.   
      
   > 	je	L10   
   > L3:   
   > 	sub	ecx, esi   
   > 	pop	ebx   
   > 	mov	eax, ecx   
   > 	pop	esi   
   > 	pop	ebp   
   > 	ret   
      
   That's good code but it occurred to me that because the offset is the   
   same into each string, one offset could be incremented instead of two   
   string pointers. Then, rather than the following (if ESI and EDI are the   
   string pointers)   
      
      add esi, 1   
      add edi, 1   
      movsx eax, [esi]   
      movsx ebx, [edi]   
      
   if the offset is in EDX then the equivalent would be a bit shorter. That   
   could end up being faster. And it saves a register.   
      
      add ebx, 1   
      movsx eax, [esi + edx]   
      movsx ebx, [edi + edx]   
      
      
   I saw you (Rod) make a good point in another thread that if lower() is a   
   function call then its overhead can be avoided in many cases by XOR of   
   the two bytes to see if they /might/ match. If the XOR is 0 then they   
   match. If it is 0x20 then the might match. Otherwise, they cannot match   
   and there's no need to lower-case either of them.   
      
   Even better, the XOR operation can set the flags for the equality test   
   so we don't need a CMP. Instead of the initial test   
      
      cmp eax, ecx  ;Compare the two chars   
      je these_chars_match   
      
   we could use   
      
      xor eax, ecx   
      jz these_chars_match   
      
   And then EAX is ready to be tested for whether the two bytes /might/ be   
   a case-insensitive match.   
      
      cmp eax, 0x20   
      jne found_a_mismatch   
      ;The chars might match   
      
   (All code untested and may well contain errors....)   
      
   --   
   James Harris   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]