... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 2,713 of 4,675
Terje Mathisen to Terje Mathisen
Re: Optimize stricmp() algorithm (casele
25 Jun 17 18:41:00
   From: terje.mathisen@nospicedham.tmsw.no   
      
   More ideas:   
      
   In almost all languages the number of glyphs that have both uppercase   
   and lowercase versions is very limited, right?   
      
   I.e. it is probably way less than 256 such characters in all the text   
   you'll ever encounter, this means that a collision-free hash table can   
   be relatively easily constructed and that can reduce the wide wtolower()   
   call to hashing the input down to a reasonable number of bits and then   
   looking it up in the hash table:   
      
   Each hash table entry will contain the character that you are looking   
   up, along with its lowercase equivalent, so the code will look like this:   
      
   int wtolower(int c)   
   {   
      unsigned h = hash(c) & TABLE_MASK;   
      if (lowertable[h].ch == c)   
        c = lowertable[h].lo;   
      return c;   
   }   
      
   This function is still so expensive that Rod's optimization makes   
   perfect sense but it should reduce each wtolower() call to the cost of a   
   branch miss, more or less.   
      
   OTOH, lowercasing the strings first is still way faster since that is   
   O(n*m) (n = number of keys, m = average key length) while a comparison   
   function that lowercases on the fly will be called O(n*log(n)) times and   
   each call needs up to two wtolower) calls per key char.   
      
   Rod's version optimizes this to ~1 pair of wtolower per comparison early   
   on when the input data is mostly randomly ordered, but as soon as you   
   get smaller partitions with nearly equal keys, the average number of key   
   char pairs to compare before you find a difference will go up.   
      
   Terje   
      
   Terje Mathisen wrote:   
   > Rod, this is almost certainly faster in real-life situations, very nice   
   > catch. :-)   
   >   
   > For unicode it is even nicer because the overhead of monocasing is so   
   > much larger, and the extra overhead of doing an extra compare of the raw   
   > values is close to zero (maximum is the cost of the branch instruction   
   > itself).   
   >   
   > I have also considered a sort-of-hybrid approach where you write a   
   > flipcase() function, i.e. given a char it returns the opposite case if   
   > that is defined:   
   >   
   > int caseless_compare(const void *p1, const void *p2)   
   > {   
   >    wchar a, b;   
   >    const unsigned wchar *s1 = p1;   
   >    const unsigned wchar *s2 = p2;   
   >   
   >    do {   
   >       while ((a = *s1++) == (b = *s2++)) {   
   >     if (!a) return 0;   
   >       }   
   >    } while (flipcase(a) == b);   
   >   
   > // I have to monocase before I can know which wchar is larger!   
   >    return wtolower(a) - wtolower(b);   
   > }   
   >   
   > The savings here is that I do just one flipcase() call for each   
   > non-matching position, but then I have to do two wide wtolower() calls   
   > to determine the actual order.   
   >   
   > I also considered using the result of flipcase(): It is only when that   
   > is different from the input that we would need to monocase the other   
   > char, right?   
   >   
   > Except one crucial detail here:   
   >   
   > How is stricmp() defined?   
   >   
   > Does it always force uppercase/force lowercase before comparing?   
   >   
   > The problem occurs because there are ascii characters which are located   
   > between 'Z' and 'a', right?   
   >   
   > These will be sorted either before or after an alphabetic char depending   
   > on the case used for the compare, all other chars gives the same result.   
   > ...   
   > A bit of googling seems to indicate that the function is defined to   
   > lowercase all chars _before_ the compare, so the OP had it right, and   
   > the fastest sort will almost certanly come from a separate lowercased   
   > set of strings!   
   >   
   > Terje   
   >   
   > Rod Pemberton wrote:   
   >> On Sat, 24 Jun 2017 08:03:26 -0700 (PDT)   
   >> "Rick C. Hodgin"  wrote:   
   >>   
   >>> Can anybody help me optimize this code?   
   >>>   
   >>>     [link]   
   >>>   
   >>> It doesn't use the stack for parameters, but two global variables   
   >>> called leftptr and righptr which hold the input values.  It does   
   >>> return in the eax register, so it can be used normally that way   
   >>> for assignment.   
   >>>   
   >>> It's designed to be used as a custom assembly algorithm for a   
   >>> stricmp() algorithm which follows this general pattern, which   
   >>> is designed to be the target of a qsort() callback:   
   >>>   
   >>>     int caseless_compare(const void *p1, const void *p2)   
   >>>     {   
   >>>         int d;   
   >>>         const unsigned char *s1 = p1;   
   >>>         const unsigned char *s2 = p2;   
   >>>   
   >>>         while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1)   
   >>>             s1++, s2++;   
   >>>   
   >>>         return d;   
   >>>     }   
   >>>   
   >>> Note that this algorithm returns negative, 0, or positive values,   
   >>> which is different than other strcmp() algorithms which return -1,   
   >>> 0, 1, but it can still work generically in this case because it's   
   >>> the return result to qsort(), which only checks negative, positive,   
   >>> or equal.   
   >>>   
   >>> I tried variations of the branch direction in my asm code, and this   
   >>> one I posted was the best of those I tried.   
   >>>   
   >>> There's room for improvement, but I don't know what to look for.   
   >>> Right now it is faster than Visual Studio 2015 can generate in   
   >>> 32-bit code for that caseless_compare() block above.  It scores   
   >>> a value of 131 in my testing, and the assembly version I created   
   >>> scores 107, with the time being in milliseconds on this test.   
   >>>   
   >>> Any help is appreciated.  Thank you in advance.   
   >>   
   >> Rick,   
   >>   
   >> I'm wondering what happens if you break up the comparison into two   
   >> parts.  E.g., do a basic character comparison without translation to   
   >> lowercase.  If that fails, then do a comparison of the characters after   
   >> translation to lowercase.   
   >>   
   >> I'm guessing that breaking the comparison into two would actually   
   >> improve the overall speed of comparisons overall, as large portions of   
   >> normal text strings will be lower-cased by default, e.g., this sentence   
   >> only has one uppercase character.  A list of names or lists of   
   >> dictionary words will only have the first letter upper-cased also.   
   >>   
   >> In other words, Terje/Ben algorithm has a fixed instruction cost per   
   >> comparison which includes lower-casing every character in the string   
   >> when, most likely, many of the characters will not need to be   
   >> lower-cased for a proper comparison.  I.e., a variable rate of   
   >> comparison might be quicker if the text is mostly lowercase.   
   >>   
   >> E.g., you might want something more like this (minimally tested):   
   >>   
   >> int rp_stricmp(void *p0, void *p1)   
   >> {   
   >>   int a,b;   
   >>   char *x,*y;   
   >>   
   >>   a=!0;   
   >>   x=p0;   
   >>   y=p1;   
   >>   
   >>   while(1)   
   >>   {   
   >>     if(!a)   
   >>       break;   
   >>     a=*x;   
   >>     b=*y;   
   >>     x++;   
   >>     y++;   
   >>     if(a==b)   
   >>       continue;   
   >>     if(lower[a]==lower[b])   
   >>       continue;   
   >>     break;   
   >>   }   
   >>   return(a-b);   
   >> }   
   >>   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]