home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 241,960 of 243,242   
   Kaz Kylheku to Michael Sanders   
   Re: Unicode...   
   14 Nov 25 21:20:43   
   
   From: 643-408-1753@kylheku.com   
      
   On 2025-11-14, Michael Sanders  wrote:   
   > Well, I finally got bitten by Unicode.   
   >   
   > Managed a work around, but I don't have enough experience   
   > with Unicode to know just exactly what I'm doing...   
   >   
   > #include    
   > #include    
   >   
   > static int utf8_width(const char *s) {   
      
   By width do you mean code point count?   
      
   This is easily confusable for "display width" which is a concept   
   of how many columns a Unicode string needs on a monospaced display   
   or printer.   
      
   If you ever edit UTF-8 in your Vim or whatever, you will see that   
   certain, e.g. East Asian characters occupy two character positions.   
      
   Kazinator's TXR language:   
      
   This is the TXR Lisp interactive listener of TXR 302.   
   Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.   
   1> (len "今日は皆さん")   
   6   
   2> (display-width "今日は皆さん")   
   12   
   3> (coded-length "今日は皆さん")   
   18   
      
   The length (in terms of code points, not characters) is 6.   
      
   Length is tricky, because code points are not characters; it depends   
   on how you define it. In Unicode there are "grapheme clusters":   
   combinations of code points making one character.   
      
   The display width is 12: all characters are East Asian so take up   
   two character cell widths on a monospaced terminal display.   
      
   The coded-length is 18: 18 UTF-8 bytes.  I didn't call the function   
   utf8-length, because the project only supports UTF-8 encoding.   
      
   All text-I/O is UTF-8 and that cannot be turned off.   
      
   --   
   TXR Programming Language: http://nongnu.org/txr   
   Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal   
   Mastodon: @Kazinator@mstdn.ca   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca