home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 241,968 of 243,242   
   Bonita Montero to All   
   Re: Unicode...   
   15 Nov 25 05:51:55   
   
   From: Bonita.Montero@gmail.com   
      
   Am 14.11.2025 um 22:03 schrieb Michael Sanders:   
   > Well, I finally got bitten by Unicode.   
   >   
   > Managed a work around, but I don't have enough experience   
   > with Unicode to know just exactly what I'm doing...   
   >   
   > #include    
   > #include    
   >   
   > static int utf8_width(const char *s) {   
   >      int w = 0;   
   >      const unsigned char *p = (const unsigned char *)s;   
   >   
   >      while (*p) {   
   >          if (*p < 0x80) { w++; p++; } // ASCII 1-byte   
   >          else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8   
   >          else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8   
   >          else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8   
   >          else { w++; p++; } // fallback   
   >      }   
   >   
   >      return w;   
   > }   
   >   
   > int main(void) {   
   >      const char *s = "élan";   
   >      printf("string:     %s\n", s);   
   >      printf("strlen:     %d\n", strlen(s)); // 4   
   >      printf("utf8_width: %d\n", utf8_width(s)); //5   
   >   
   >      return 0;   
   > }   
   >   
   Try this idea written in C++ in C:   
      
   size_t utf8Width( span::iterator it )   
   {   
        size_t w = 0;   
        for( ; *it; ++w )   
            if( int head = countl_zero( (unsigned char)~*it ); head <= 3 )   
   [[likely]]   
                it += head + 1;   
            else   
                ++it;   
        return w;   
   }   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca