Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.c++.moderated    |    Moderated discussion of C++ superhackery    |    33,346 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 31,802 of 33,346    |
|    Martin B. to Miles Bader    |
|    Re: unicode and string    |
|    15 Jan 12 15:14:07    |
      From: 0xCDCDCDCD@gmx.at              On 15.01.2012 14:13, Miles Bader wrote:       > "Martin B."<0xCDCDCDCD@gmx.at> writes:       >> Personally I feel it was a *very* bad decision not to have a distinct       >> UTF-8 character (literal) type. (I mean, we have char16_t and char32_t,       >> why the hell not char8_t and be done with it!)       >       > Presumably the issue is that in sane (non-MS) implementations, utf-8       > literals work perfectly well with existing char-based infrastructure,       > and an increasingly large number of interfaces simply assume all char*       > strings are encoded using utf-8,              Examples! Numbers! :-)              >       > [Granted, MS-style wide-strings result in a similar giant ball of hair       > but if anything that serves as a _warning_...]       >       > Maybe they could have ...       > made it all work       > out, I dunno, but given the widespread use of utf-8 in char* strings              Examples! Numbers! :-)              > and the potential for snowballing complexity, this may be the most       > practical route-of-least-effort.       >              I still fail to see the point.              Since you seem to imply that char* == utf-8 is very widespread, let me       myself throw in two statements:              + libxml2 -- as far as I can tell a widespread -- XML library, does use       *unsigned* char as utf-8 datatype, and specifically *not* char.              + I strongly believe that *most* Windows C++ applications that use       narrow char (`char`) do *not* use it as utf-8. Indeed, I strongly       believe that there are a bazallion of Windows C++ apps out there for       which the situation `char == utf-8` is completely broken:              ++ Most Windows apps that (still) use narrow char to interface with the       narrow Windows API versions would not use char with utf-8.              ++ Last I checked, very many programs on Windows that write text files       for some purpose, do so in a narrow 8bit encoding on a western Windows.       (Some variation of the ISO Latin encoding.) I think we can assume these       program use char for those strings and it's not utf-8 either.                     To sum up, and to phrase it a bit more strongly:              The fact that C++11 introduces char16_t and char32_t but no char8_t is       crappy.              The fact that u8"" literals map to `char` of all things is rather       horrible! I, personally, would be better serverd if it mapped to       `unsigned char` but I can imagine that this could create problems elsewhere.                     cheers,       Martin                     --        [ See http://www.gotw.ca/resources/clcm.htm for info about ]        [ comp.lang.c++.moderated. First time posters: Do this! ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca