home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 242,033 of 243,242   
   Bonita Montero to All   
   Re: Unicode...   
   22 Nov 25 06:39:20   
   
   From: Bonita.Montero@gmail.com   
      
   Am 21.11.2025 um 18:03 schrieb bart:   
   > On 15/11/2025 05:24, Bonita Montero wrote:   
   >> A little bugfix and a perfect style:   
   >>   
   >> #include    
   >> #include    
   >> #include    
   >> #include    
   >>   
   >> using namespace std;   
   >>   
   >> optional utf8Width( u8string_view str )   
   >> {   
   >>      size_t w = 0;   
   >>      for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]   
   >>          if( size_t head = countl_zero( (unsigned char)~*it ); head   
   >> <= 4 && (size_t)(str.end() - it) >= head + 1 ) [[likely]]   
   >>              it += head + 1;   
   >>          else   
   >>              return nullopt;   
   >>      return w;   
   >> }   
   >>   
   >> int main()   
   >> {   
   >>      cout << *utf8Width( u8"Hello, 世界!" ) << endl;   
   >> }   
   >>   
   >   
   > The trouble with this is that I haven't a clue how it works or what   
   > those extras do, or how they impact on performance.   
   >   
   > A version in C is given below. This is much more straightforward. It   
   > doesn't verify anything, but then I don't know if yours does either.   
   >   
   > As for performance: I duplicated that test string to form one 104   
   > times as long, then called that function one million times. Here are   
   > the timings:   
   >   
   >   C   gcc-O2     1.06   seconds   
   >   C   bcc        1.17   seconds   
   >   C   tcc        2.81   seconds   
   >   
   >   C++ g++-O2     4.6   seconds   
   >   C++ g++-O0    19     seconds   
   >   
   > --------------------------   
   >   
   > size_t utf8width(char* s) {   
   >     size_t length;   
   >     int c, n;   
   >   
   >     length=0;   
   >     while (c=*s) {   
   >         if ((c & 0x80) == 0) n = 1;   
   >         else if ((c & 0xE0) == 0xC0) n = 2;   
   >         else if ((c & 0xF0) == 0xE0) n = 3;   
   >         else n = 4;   
   >         s += n;   
   >         ++length;   
   >     }   
   >     return length;   
   > }   
   >   
   Take a string of a number of UTF-8 characters with a proper   
   mixed chunk-lengths.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca