... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.c
Meh, in C you gotta define EVERYTHING
243,242 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 241,987 of 243,242
Janis Papanagnou to Paul Edwards
Re: C90+ toequiv()
17 Nov 25 10:20:40
   From: janis_papanagnou+ng@hotmail.com   
      
   On 11/15/25 10:33, Paul Edwards wrote:   
   > I am not 100% sure, but I believe some people (Greeks?)   
   > have keyboards such that their native character set can be   
   > freely entered, when they're working in their native language.   
      
   In Greece you will typically get keyboards with the Greek letters.   
      
   > And if they are required to work in English, or rather, 7-bit   
   > ASCII, they will "switch keyboards", ie using the mouse or   
   > whatever to select a different keyboard, and type the English,   
   > and then return to the Greek etc keyboard.   
      
   I had once configured a system to use some control-key combination   
   (like Ctrl-Alt-Shift) to switch between three different languages   
   (EN, GR, DE).   
      
   > I'm interested in a slight change to C90. I'm not interested in   
   > UTF-8 either.   
      
   You have to map the keys to characters of some specific "codepage".   
      
   It sounds to me that you want with an interactive keyboard-layout   
   change also to switch the underlying character encoding. - To me   
   that just sounds wrong! - How would a string like "Pµä" be then   
   encoded?   
      
   The environment that I set up just used UTF-8, a single encoding   
   for all (in that case just three) languages. That way you could   
   type (Greek) 'µ' or (German) 'ä' or any other character (as far   
   as it's supported by the system with fonts, etc.).   
      
   > I'd like to write a program using pure ASCII, and indeed, pure   
   > English prompts, but not force a Greek user to switch keyboards.   
      
   I understand it that the "C"-code is as usual ASCII but embedded   
   strings may be any other character.   
      
   Again: How would a string like "Pµä" be then encoded?   
      
   The 'µ' (like an 'ä') could stem from ISO 8859-15 (but then it would   
   be a special case), or from ISO 8859-7 (the native Greek variant of   
   Latin), or from UTF-8. - You cannot represent these characters by a   
   single ASCII-character.   
      
   > I'm not interested in a complicated translation layer either.   
      
   What comes below sounds very fuzzy; I certainly don't understand what   
   you have in mind there, so I cannot really comment on that.   
      
   For me, the solution for multi-language programming environment would   
   not switch character encodings but use a single standard (UTF-8) for   
   that.   
      
   >   
   > Originally I was thinking I just need to modify my programs and   
   > the Greek locale so that I could do:   
   >   
   > if (toupper(c) == 'X') printf("whatever\n");   
   >   
   > And make some random Greek character the equivalent of 'X', ie   
   > the Greek user knows that when prompted to type 'x' (or 'X'), he   
   > just needs to press (lambda or whatever Greeks use). The Greek   
   > locale will convert lambda into X when passed to toupper.   
      
   Are you looking for an ASCII representation of that (template?) 'X'?   
   Something like "μ" (Like "µ" for 'µ' in HTML)?   
      
   >   
   > However, it was pointed out to me that this would interfere with   
   > storing filenames on traditional FAT, for example. Not everything   
   > should be subject to uppercasing. The Greek, or Katakana, should   
   > be preserved, not converted into ASCII gibberish.   
      
   You should be aware that on filename level you typically have (on   
   Unixes) just anonymous octets that need an interpretation to be   
   displayed. (It may be UCS2 with Windows filesystems; don't know.)   
      
   In my Linux/UTF-8 environment my filenames may contain umlauts or   
   Greek letters   
      
   $ touch "Pµä"   
   $ ls  "Pµä"   
   Pµä   
      
   The filename will be stored in octets (values 0..255), where each   
   non-ASCII character will occupy more than one octet.   
      
   Such filenames will only be displayed as "ASCII gibberish" if you   
   somehow "force" it to be interpreted as pure ASCII.   
      
   >   
   > So I was thinking I need some halfway point of equivalency.   
   >   
   > I'm happy to change all my programs so that they don't rely on   
   > the user typing in an exact character. ie I am happy to drop case   
   > sensitivity from everything, "now that I know there's an issue".   
   > Actually there are other environments where case sensitivity is   
   > difficult. e.g. some CMS (mainframe) environments.   
   >   
   > And making sure I do toupper() is a way to solve the issue for   
   > the environments where case-sensitivity is difficult/impossible.   
   > (assuming they exist).   
      
   Usually this should be handled by the locale setting.   
      
      $ awk 'BEGIN {print tolower("Γ")}'   
      γ   
      $ awk 'BEGIN {print toupper("γ")}'   
      Γ   
      
   The test above is from my environment (using UTF-8); it works even   
   *without* setting any Greek locale (I'm using "en_US.UTF-8".)   
      
   >   
   > But I'd like to go one step further and cater for Greeks etc.   
   >   
   > And it seems to me that I want to not change toupper() - which   
   > would be expected to uppercase Greek characters (or some   
   > other language), independent of the uppercasing of any English   
   > characters they happened to enter (potentially at "great effort"   
   > of changing keyboards).   
   >   
   > And what I'm really after is being able to designate some Greek   
   > characters as the equivalent of English counterparts in circumstances   
   > where that is appropriate,   
      
   To me this sounds as fuzzy as wrong.   
      
   > and there is a desire to avoid a keyboard   
   > change. So a new isequiv() function as an extension to C90. (I'm   
   > basically forking C90 to create a C90+ or C90.0.1 - same as we   
   > do with software - bells and whistles go into C99 etc, not C90.0.1)   
   >   
   > Any thoughts?   
      
   I cannot comment on your reluctance to use (or requirement to not use)   
   an underlying UTF-8 encoding, and the rest should be handled by the   
   locale - if UTF-8 characters are supported and allowed in string and   
   character literals of the respective programming language; I haven't   
   tried for "C" (but I seem to have no problems with using any UTF-8   
   characters in string literals).   
      
   I hope you found some useful information and got some more insights.   
   If, based on that, you can clarify your intentions and thoughts I'd   
   be interested to hear what you're actually trying to achieve.   
      
   Janis   
      
   >   
   > Thanks. Paul..   
   > a   
   >   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]