... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.asm.x86

Ahh, the lost art of x86 assembly

4,675 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 2,710 of 4,675

Terje Mathisen to Rick C. Hodgin

Re: Optimize stricmp() algorithm (casele

25 Jun 17 07:48:03

   From: terje.mathisen@nospicedham.tmsw.no   
      
   Rick C. Hodgin wrote:   
   > On Saturday, June 24, 2017 at 5:50:14 PM UTC-4, Terje Mathisen wrote:   
   >> Rick C. Hodgin wrote:   
   >>> Can anybody help me optimize this code?   
   >> [snip]   
   >>> It's designed to be used as a custom assembly algorithm for a   
   >>> stricmp() algorithm which follows this general pattern, which   
   >>> is designed to be the target of a qsort() callback:   
   >>>   
   >>>     int caseless_compare(const void *p1, const void *p2)   
   >>>     {   
   >>>         int d;   
   >>>         const unsigned char *s1 = p1;   
   >>>         const unsigned char *s2 = p2;   
   >>>   
   >>>         while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1)   
   >>>             s1++, s2++;   
   >>>   
   >>>         return d;   
   >>>     }   
   >>>   
   >>   
   >> Just writing a version of this code which can handle national 8-bit   
   >> character sets is more interesting, you pretty much have to use some   
   >> form of lookup table, i.e.   
   >>   
   >>    while (d = tolower[c = *s1++] - tolower[*s2++] && c) {};   
   >   
   > This minor tweak has given the best results so far:   
   >   
   >     Rick2   = 123   
   >     Bart    = 117   
   >     Ben     = 116   
   >     RickAsm = 105   
   >     BartAsm = 104   
   >     Terje C = 96   
   >   
   > The original algorithm yours was based off is the Ben score above.   
   > Your little tweak gave it a 20-point gain.   
      
   Oops, as you've already noticed I forgot to keep the comparison, i.e.   
   the diff needs to be zero. :-(   
      
   However, as long as this is ~the same speed as the 7-bit only code then   
   it is a big win in my book since it would support any 8-bit character   
   set including national language versions of both ASCII and EBCDIC.   
      
   What I intended to write more about yesterday but was how to make a   
   reasonable fast unicode version:   
      
   Do a setup stage first, either to monocase all the keys, or to create a   
   hash table consisting of the lowercase version of all the individual   
   characters seen in those keys.   
      
   With monocased keys the problem is of course already solved, at a cost   
   of N*M (nr of keys * average key length) tolower() calls, while a hash   
   table would give near-linear lookup speeds after first spending the same   
   time calling tolower() the same number of times.   
      
   Personally I would have gone to great lengths in the old days to   
   minimize ram use, but today I would much rather use the nomocased keys   
   approach.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]