home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 2,707 of 4,675   
   Terje Mathisen to Rick C. Hodgin   
   Re: Optimize stricmp() algorithm (casele   
   24 Jun 17 23:43:36   
   
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Rick C. Hodgin wrote:   
   > Can anybody help me optimize this code?   
   [snip]   
   > It's designed to be used as a custom assembly algorithm for a   
   > stricmp() algorithm which follows this general pattern, which   
   > is designed to be the target of a qsort() callback:   
   >   
   >     int caseless_compare(const void *p1, const void *p2)   
   >     {   
   >         int d;   
   >         const unsigned char *s1 = p1;   
   >         const unsigned char *s2 = p2;   
   >   
   >         while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1)   
   >             s1++, s2++;   
   >   
   >         return d;   
   >     }   
   >   
      
   Just writing a version of this code which can handle national 8-bit   
   character sets is more interesting, you pretty much have to use some   
   form of lookup table, i.e.   
      
      while (d = tolower[c = *s1++] - tolower[*s2++] && c) {};   
      
   With unicode characters (up to 20/31 bits wide?) it is of course   
   impossible to use a simple lookup table, you need some kind of   
   algorithmic conversion.   
      
   It is probably significantly faster to just create a special key table   
   containing a monocased version of the strings, sort that, and then use   
   the resulting sort order on the original table.   
      
   For searching I've found that you can get very good results by creating   
   two new opies of the target string, one that is all uppercase and   
   another which is all lowercase, then when looking for a match at a gven   
   string offset you allow either target copy to match.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca