home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.os.linux.misc      Linux-specific topics not covered by oth      135,536 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 133,562 of 135,536   
   Johnny Billquist to Waldek Hebisch   
   Re: Recent history of vi   
   17 Dec 25 10:39:40   
   
   XPost: alt.folklore.computers   
   From: bqt@softjar.se   
      
   On 2025-12-16 03:20, Waldek Hebisch wrote:   
   > In alt.folklore.computers Johnny Billquist  wrote:   
   >    
   >>> The biggest problem I have with any Unicode representation except (I   
   >>> think) UTF-32 is that a program has no way of knowing how long a string   
   >>> is without encoding/decoding it. Given a string of characters in some   
   >>> codepage, how many bytes does it occupy when converted to UTF-8? Given a   
   >>> UTF-8 character string, how many character positions does it occupy,   
   >>> say, for example, when displayed on a screen?   
   >>   
   >> True. However, that has nothing to do with Unicode as such, but the   
   >> UTF-8 encoding of it.   
   >   
   > Unicode has combining "characters", so to know how many "real"   
   > character you have you need to combine.  IIUC for Korean Hangul   
   > character can be buit from 3 separate pieces, each taking one code   
   > point, but also there are "precomposed" combinations taking a   
   > single code point.  My reading of description is that 3 pieces   
   > version and precomposed one are supposed to display the same.   
   >   
   > There are also code point for ligatures, for most puproses ligature   
   > fi' counts as two characters, but is a single code point.  Terminal   
   > may display it in a single cell, but arguably for noice monspaced   
   > display one should expand ligatures.  For display we have single   
   > cell characters and double width one, so to know width one needs   
   > at least table giving width of codepoint and add widths of all   
   > codepoints.   
      
   Excellent points.   
      
      Johnny   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca