... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.c

Meh, in C you gotta define EVERYTHING

243,242 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 241,983 of 243,242

Mikko to Michael Sanders

Re: Unicode...

16 Nov 25 11:22:54

   From: mikko.levanto@iki.fi   
      
   On 2025-11-15 19:09:16 +0000, Michael Sanders said:   
      
   > On Sat, 15 Nov 2025 12:47:03 +0200, Mikko wrote:   
   >   
   >> On 2025-11-14 21:03:38 +0000, Michael Sanders said:   
   >>   
   >>> Well, I finally got bitten by Unicode.   
   >>>   
   >>> Managed a work around, but I don't have enough experience   
   >>> with Unicode to know just exactly what I'm doing...   
   >>>   
   >>> #include    
   >>> #include    
   >>>   
   >>> static int utf8_width(const char *s) {   
   >>> int w = 0;   
   >>> const unsigned char *p = (const unsigned char *)s;   
   >>>   
   >>> while (*p) {   
   >>> if (*p < 0x80) { w++; p++; } // ASCII 1-byte   
   >>> else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8   
   >>> else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8   
   >>> else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8   
   >>> else { w++; p++; } // fallback   
   >>> }   
   >>>   
   >>> return w;   
   >>> }   
   >>   
   >> The code above may cause problems if the argument string is not well   
   >> formed UTF-8. For example, the zero terminator coud be missed. Of   
   >> course an invalid tring can be expected to cause problems anyway but   
   >> some errors are harder to debug than others.   
   >>   
   >> Another way is   
   >>   
   >> static int utf8_width(const char *s) {   
   >> int w = 0;   
   >> const unsigned char *p = (const unsigned char *)s;   
   >>   
   >> while (*p) {   
   >> if ((*p & 0xC0) != 0x80) w++; // count the first bytes of each character   
   >> }   
   >>   
   >> return w;   
   >> }   
   >>   
   >> One could also add a check that each character has the right number of   
   >> bytes of the right kind and if not regard that as the end of the string.   
   >   
   > Excellent I've added your reply to my notes, thank you Mikko.   
      
   You are welcome.   
      
   --   
   Mikko   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]