home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 242,003 of 243,242   
   James Kuyper to Michael Sanders   
   Re: Unicode...   
   19 Nov 25 09:08:10   
   
   From: jameskuyper@alumni.caltech.edu   
      
   On 2025-11-18 15:17, Michael Sanders wrote:   
   > On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:   
   >   
   >> Could you identify which document guarantees that every Unicode locale   
   >> contains "UTF-8"? Do you know what the domain of applicability of that   
   >> document is? It apparently does not cover my Ubuntu Linux system. The   
   >> command "locale -a" provides a list of all supported locales. Here's   
   >> what it says:   
   >>   
   >> [...]   
   >   
   > Hi James, umm 'guarantees'? No no... It does NOT verify:   
   >   
   > - whether the environment actually supports UTF8 fully   
   > - whether multibyte functions are enabled   
   > - whether the terminal supports UTF8   
   > - whether the C library supports UTF8 normalization   
   >   (combining characters, etc. but it seems to work well here)   
   >   
   > To be sure: It's not a UTF-8 capability test. It's only a   
   > locale-string check. So it likely misses many valid UTF8   
   > locale variants...   
      
   If intended for use by anyone other than yourself, you should document   
   it's limitations in that regard, either with in-code comments or in user   
   documentation.   
      
   > Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.   
   >   
   > The best I can tell you at this stage is that it works on my end,   
   > not a very satisfying reply I'm sure you'd agree. But till I learn   
   > more about the issue that's the best I can offer.   
   >   
   > If you manage an improvement, please do post it here in the group   
   > so I can learn more too.   
      
   There might be documents specifying locale naming standards, but I'm not   
   aware of any. In the absence of such standards, or on systems not   
   covered by such standards, there's not much you can do about this.   
      
   If your targets include Linux Mint, there's a chance the locale names   
   might be similar to those on my Ubuntu Linux system - but I'm no expert   
   on the differences between Linux distributions. If so, you should make   
   the "UTF" search case-insensitive, and make the '-' optional, which   
   would add considerable complexity to what is currently a very simple   
   routine.   
      
   I'm curious - if you're interested in Unicode, why are you not making   
   any use of the Unicode support available in the current version of C?   
   Does your code need to work under older versions of C?   
      
   Since C2023, a conforming implementation of C is required to support   
   character constants and string literals that use UTF-8, UTF-16, and   
   UTF-32 encodings when prefixed with u8, u or U, respectively. Those use   
   the char8_t, char16_t, and char32_t types. Also new in C2023 is   
   mbrtoc8() and c8rtomb().   
   Those prefixes and types go back to C2011, where it was optional whether   
   they used those encodings. There were pre#defined macros which could be   
   queried to determine whether or not they did. Routines for converting   
   between those types and multi-byte strings or wchar_t also go back to   
   that time.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca