... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.c
Meh, in C you gotta define EVERYTHING
243,242 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 242,061 of 243,242
Paul Edwards to Janis Papanagnou
Re: C90+ toequiv() (1/2)
23 Nov 25 08:54:14
   From: mutazilah@gmail.com   
      
   "Janis Papanagnou"  wrote in message   
   news:10fepd8$3p5tk$4@dont-email.me...   
   > On 11/15/25 10:33, Paul Edwards wrote:   
   > > I am not 100% sure, but I believe some people (Greeks?)   
   > > have keyboards such that their native character set can be   
   > > freely entered, when they're working in their native language.   
   >   
   > In Greece you will typically get keyboards with the Greek letters.   
      
   Exactly what I expect. And I'm not trying to change that.   
      
   > > And if they are required to work in English, or rather, 7-bit   
   > > ASCII, they will "switch keyboards", ie using the mouse or   
   > > whatever to select a different keyboard, and type the English,   
   > > and then return to the Greek etc keyboard.   
   >   
   > I had once configured a system to use some control-key combination   
   > (like Ctrl-Alt-Shift) to switch between three different languages   
   > (EN, GR, DE).   
      
   And assuming someone only knows Greek, I don't want them   
   to ever switch keyboards.   
      
   (I wouldn't want it if it was the other way around - and ASCII   
   was in fact all Greek).   
      
   > > I'm interested in a slight change to C90. I'm not interested in   
   > > UTF-8 either.   
   >   
   > You have to map the keys to characters of some specific "codepage".   
      
   Yes, any traditional Greek codepage is fine. Or even a   
   non-traditional Greek codepage. I expect all the Greek   
   characters to exist between x'80' and x'FF'.   
      
   > It sounds to me that you want with an interactive keyboard-layout   
   > change also to switch the underlying character encoding.   
      
   No - not correct. There won't even BE a keyboard change.   
   Unless you have someone who speaks more than just Greek.   
      
   > - To me   
   > that just sounds wrong! - How would a string like "Pµä" be then   
   > encoded?   
      
   However you did it in 1990 when ISO/IEC 9899:1990 was   
   published.   
      
   > The environment that I set up just used UTF-8, a single encoding   
   > for all (in that case just three) languages. That way you could   
   > type (Greek) 'µ' or (German) 'ä' or any other character (as far   
   > as it's supported by the system with fonts, etc.).   
      
   Yes, I'm not interested in the computing power needed to do   
   that. Nor the burden placed on the display to display Kanji.   
   I'm only interested in (half-width I think) Katakana. ie the   
   first Japanese displays for the PC from the 1980s.   
      
   > > I'd like to write a program using pure ASCII, and indeed, pure   
   > > English prompts, but not force a Greek user to switch keyboards.   
   >   
   > I understand it that the "C"-code is as usual ASCII but embedded   
   > strings may be any other character.   
      
   As an "English" C programmer, I do not wish to put in embedded   
   Greek strings. Nor provide a translation layer for Greek. Nor   
   have the speed of my program impacted to support Greek   
   characters.   
      
   I don't expect the Greeks to learn English, but I do expect them   
   to get used to using the software such that they recognize that   
   when a particular bit of English gibberish like "Enter your name"   
   appears on the screen, that they know it is time to enter their   
   (Greek) name. I knew some Chinese people who operated the   
   ATM in Australia like that - they didn't bother to learn what   
   the prompts were - they just memorized the sequence. The   
   user interface changed one day and they couldn't withdraw   
   money anymore. The executable I provide won't change   
   unless you change it, and then you'll need to memorize that   
   new sequence as part of the upgrade - if you have zero   
   English.   
      
   > Again: How would a string like "Pµä" be then encoded?   
      
   As per 1980s. I'm not trying to change what you did in the   
   1980s. And I'm not trying to change the existing "accents".   
   ie you type ^ and then a "u" and then you get a u with a   
   hat. (in some countries).   
      
   > The 'µ' (like an 'ä') could stem from ISO 8859-15 (but then it would   
   > be a special case), or from ISO 8859-7 (the native Greek variant of   
   > Latin), or from UTF-8. - You cannot represent these characters by a   
   > single ASCII-character.   
      
   I don't want to use ASCII. Nothing a Greek-only person will   
   ever type will be in ASCII, it will all be a SINGLE value   
   between x'80' and x'FF'.   
      
   > > I'm not interested in a complicated translation layer either.   
   >   
   > What comes below sounds very fuzzy; I certainly don't understand what   
   > you have in mind there, so I cannot really comment on that.   
      
   I'm happy to explain.   
      
   > For me, the solution for multi-language programming environment would   
   > not switch character encodings but use a single standard (UTF-8) for   
   > that.   
      
   It's not multi-language - well - not by preference. The end   
   user would much rather have the prompts for "what is your   
   name?" in Greek, but that's not on the table.   
      
   > > Originally I was thinking I just need to modify my programs and   
   > > the Greek locale so that I could do:   
   > >   
   > > if (toupper(c) == 'X') printf("whatever\n");   
   > >   
   > > And make some random Greek character the equivalent of 'X', ie   
   > > the Greek user knows that when prompted to type 'x' (or 'X'), he   
   > > just needs to press (lambda or whatever Greeks use). The Greek   
   > > locale will convert lambda into X when passed to toupper.   
   >   
   > Are you looking for an ASCII representation of that (template?) 'X'?   
   > Something like "μ" (Like "µ" for 'µ' in HTML)?   
      
   Yes, an ASCII representation of similar-to-uppercase "micro".   
      
   > > However, it was pointed out to me that this would interfere with   
   > > storing filenames on traditional FAT, for example. Not everything   
   > > should be subject to uppercasing. The Greek, or Katakana, should   
   > > be preserved, not converted into ASCII gibberish.   
   >   
   > You should be aware that on filename level you typically have (on   
   > Unixes) just anonymous octets that need an interpretation to be   
   > displayed. (It may be UCS2 with Windows filesystems; don't know.)   
   >   
   > In my Linux/UTF-8 environment my filenames may contain umlauts or   
   > Greek letters   
   >   
   > $ touch "Pµä"   
   > $ ls  "Pµä"   
   > Pµä   
   >   
   > The filename will be stored in octets (values 0..255), where each   
   > non-ASCII character will occupy more than one octet.   
   >   
   > Such filenames will only be displayed as "ASCII gibberish" if you   
   > somehow "force" it to be interpreted as pure ASCII.   
      
   It won't be ASCII or UTF-8. It will be what MSDOS in   
   Greece did in the 1980s. I won't say "when men were men",   
   but it's a variation of that.   
      
   > > So I was thinking I need some halfway point of equivalency.   
   > >   
   > > I'm happy to change all my programs so that they don't rely on   
   > > the user typing in an exact character. ie I am happy to drop case   
   > > sensitivity from everything, "now that I know there's an issue".   
   > > Actually there are other environments where case sensitivity is   
   > > difficult. e.g. some CMS (mainframe) environments.   
   > >   
   > > And making sure I do toupper() is a way to solve the issue for   
   > > the environments where case-sensitivity is difficult/impossible.   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]