home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c++.moderated      Moderated discussion of C++ superhackery      33,346 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 31,820 of 33,346   
   James K. Lowden to Miles Bader   
   Re: unicode and string   
   19 Jan 12 14:56:03   
   
   From: jklowden@speakeasy.net   
      
   On Sun, 15 Jan 2012 05:13:25 -0800 (PST)   
   Miles Bader  wrote:   
      
   > utf-8   
   > literals work perfectly well with existing char-based infrastructure,   
   > and an increasingly large number of interfaces simply assume all char*   
   > strings are encoded using utf-8,   
      
   Well, not "perfectly", right?  char* strings and std::string lack   
   character semantics when used with utf-8.  That is,   
      
   	std::string::operator[]   
      
   returns a byte, and no operator returns a character.   
      
   > and they didn't want the giant ball   
   > of hair that would come with a really distinct type.   
      
   Why is a distinct type of character a hairball?  With a basic type such   
   as   
      
   template    
   class code_point   
   {   
   	enum encoding_t { ... } encoding;   
   	C value;   
   };   
      
   a class std::encoded_string could be derived from basic_string with   
   code_point as the first template argument.  encoded_string then has   
   character semantics, and every code_point carries enough information to   
   map it to another encoding.   
      
   We talk about environments and strings having an encoding, but really,   
   by definition, each character is encoded.  ISTM representing that   
   reality in a class is very basic OO choice, not hairball at all.   
      
   --jkl   
      
      
   --   
         [ See http://www.gotw.ca/resources/clcm.htm for info about ]   
         [ comp.lang.c++.moderated.    First time posters: Do this! ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca