... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.c++.moderated

Moderated discussion of C++ superhackery

33,346 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 31,802 of 33,346

Martin B. to Miles Bader

Re: unicode and string

15 Jan 12 15:14:07

   From: 0xCDCDCDCD@gmx.at   
      
   On 15.01.2012 14:13, Miles Bader wrote:   
   > "Martin B."<0xCDCDCDCD@gmx.at>  writes:   
   >> Personally I feel it was a *very* bad decision not to have a distinct   
   >> UTF-8 character (literal) type. (I mean, we have char16_t and char32_t,   
   >> why the hell not char8_t and be done with it!)   
   >   
   > Presumably the issue is that in sane (non-MS) implementations, utf-8   
   > literals work perfectly well with existing char-based infrastructure,   
   > and an increasingly large number of interfaces simply assume all char*   
   > strings are encoded using utf-8,   
      
   Examples! Numbers! :-)   
      
   >   
   > [Granted, MS-style wide-strings result in a similar giant ball of hair   
   > but if anything that serves as a _warning_...]   
   >   
   > Maybe they could have ...   
   > made it all work   
   > out, I dunno, but given the widespread use of utf-8 in char* strings   
      
   Examples! Numbers! :-)   
      
   > and the potential for snowballing complexity, this may be the most   
   > practical route-of-least-effort.   
   >   
      
   I still fail to see the point.   
      
   Since you seem to imply that char* == utf-8 is very widespread, let me   
   myself throw in two statements:   
      
   + libxml2 -- as far as I can tell a widespread -- XML library, does use   
   *unsigned* char as utf-8 datatype, and specifically *not* char.   
      
   + I strongly believe that *most* Windows C++ applications that use   
   narrow char (`char`) do *not* use it as utf-8. Indeed, I strongly   
   believe that there are a bazallion of Windows C++ apps out there for   
   which the situation `char == utf-8` is completely broken:   
      
   ++ Most Windows apps that (still) use narrow char to interface with the   
   narrow Windows API versions would not use char with utf-8.   
      
   ++ Last I checked, very many programs on Windows that write text files   
   for some purpose, do so in a narrow 8bit encoding on a western Windows.   
   (Some variation of the ISO Latin encoding.) I think we can assume these   
   program use char for those strings and it's not utf-8 either.   
      
      
   To sum up, and to phrase it a bit more strongly:   
      
   The fact that C++11 introduces char16_t and char32_t but no char8_t is   
   crappy.   
      
   The fact that u8"" literals map to `char` of all things is rather   
   horrible! I, personally, would be better serverd if it mapped to   
   `unsigned char` but I can imagine that this could create problems elsewhere.   
      
      
   cheers,   
   Martin   
      
      
   --   
         [ See http://www.gotw.ca/resources/clcm.htm for info about ]   
         [ comp.lang.c++.moderated.    First time posters: Do this! ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]