XPost: alt.folklore.computers   
   From: bqt@softjar.se   
      
   On 2025-12-16 03:20, Waldek Hebisch wrote:   
   > In alt.folklore.computers Johnny Billquist wrote:   
   >    
   >>> The biggest problem I have with any Unicode representation except (I   
   >>> think) UTF-32 is that a program has no way of knowing how long a string   
   >>> is without encoding/decoding it. Given a string of characters in some   
   >>> codepage, how many bytes does it occupy when converted to UTF-8? Given a   
   >>> UTF-8 character string, how many character positions does it occupy,   
   >>> say, for example, when displayed on a screen?   
   >>   
   >> True. However, that has nothing to do with Unicode as such, but the   
   >> UTF-8 encoding of it.   
   >   
   > Unicode has combining "characters", so to know how many "real"   
   > character you have you need to combine. IIUC for Korean Hangul   
   > character can be buit from 3 separate pieces, each taking one code   
   > point, but also there are "precomposed" combinations taking a   
   > single code point. My reading of description is that 3 pieces   
   > version and precomposed one are supposed to display the same.   
   >   
   > There are also code point for ligatures, for most puproses ligature   
   > fi' counts as two characters, but is a single code point. Terminal   
   > may display it in a single cell, but arguably for noice monspaced   
   > display one should expand ligatures. For display we have single   
   > cell characters and double width one, so to know width one needs   
   > at least table giving width of codepoint and add widths of all   
   > codepoints.   
      
   Excellent points.   
      
    Johnny   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|