home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 241,975 of 243,242   
   Mikko to Michael Sanders   
   Re: Unicode...   
   15 Nov 25 12:47:03   
   
   From: mikko.levanto@iki.fi   
      
   On 2025-11-14 21:03:38 +0000, Michael Sanders said:   
      
   > Well, I finally got bitten by Unicode.   
   >   
   > Managed a work around, but I don't have enough experience   
   > with Unicode to know just exactly what I'm doing...   
   >   
   > #include    
   > #include    
   >   
   > static int utf8_width(const char *s) {   
   >     int w = 0;   
   >     const unsigned char *p = (const unsigned char *)s;   
   >   
   >     while (*p) {   
   >         if (*p < 0x80) { w++; p++; } // ASCII 1-byte   
   >         else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8   
   >         else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8   
   >         else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8   
   >         else { w++; p++; } // fallback   
   >     }   
   >   
   >     return w;   
   > }   
      
   The code above may cause problems if the argument string is not well   
   formed UTF-8. For example, the zero terminator coud be missed. Of   
   course an invalid tring can be expected to cause problems anyway but   
   some errors are harder to debug than others.   
      
   Another way is   
      
   static int utf8_width(const char *s) {   
       int w = 0;   
       const unsigned char *p = (const unsigned char *)s;   
      
       while (*p) {   
         if ((*p & 0xC0) != 0x80) w++; // count the first bytes of each character   
       }   
      
       return w;   
   }   
      
   One could also add a check that each character has the right number of   
   bytes of the right kind and if not regard that as the end of the string.   
      
   --   
   Mikko   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca