home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 241,411 of 243,242   
   BGB to Thiago Adams   
   Re: _BitInt(N)   
   22 Oct 25 14:03:23   
   
   From: cr88192@gmail.com   
      
   On 10/22/2025 12:25 PM, Thiago Adams wrote:   
   > On 10/22/2025 2:23 PM, Thiago Adams wrote:   
   >> On 10/22/2025 1:42 PM, BGB wrote:   
   >>> On 10/22/2025 7:45 AM, Thiago Adams wrote:   
   >>>>   
   >>>>   
   >>>> Is anyone using or planning to use this new C23 feature?   
   >>>> What could be the motivation?   
   >>>>   
   >>>   
   >>> In my project, with my own compiler, I have made some use of it...   
   >>>   
   >   
   > The use case I have for _BitInt(N) N  is dynamic, so I am not planning   
   > to use it.   
   >   
      
   In my compiler, only constant N is allowed.   
      
      
      
   N is allowed over a range of 1 to 16383, though anything large is   
   generally implemented with runtime calls:   
      1..64: Mapped to integer operations.   
      65..128: Mapped to 128-bit integer operations.   
        Optional partial support in my ISA.   
        Rest is runtime calls.   
      129..256: Runtime calls for 256-bit integer ops.   
      257+: Runtime calls for generic large integers.   
        Storage is padded to a multiple of 128 bits, with 16-byte alignment.   
      
   In my compiler:   
      Largest fully-supported integer type is 128 bits.   
      __int128, __uint128, unsigned __int128   
      
   Partial handling exists for 256-bit values, but they are not exposed as   
   their own types. Stuff for very large integers is mostly untested.   
      
   Ironically, while it does support large integer constants, its support   
   for very large integer constants generally involves representing them   
   inside the compiler as string literals (Base85 encoded).   
      
   IIRC, there is a limit of 128 bits for decimal literals though (so going   
   larger is only really possible with hexadecimal).   
      
      
   Contrast, say:   
   GCC: Refuses to support integer types over 64 bits on most targets tested;   
   Clang: Sorta works, but has a lot of limitations, like the inability to   
   have 128-bit integer literals.   
      
      
      
   Also maybe fun is the wonk that UTF-8 string literals in BGBCC are   
   effectively double-encoded. Though, actual scheme is a little more   
   complicated:   
      00: Escaped as 2-byte (C0-80).   
      01..7F: As-is   
      0080..00FF: Encodes Bytes 0x80..0xFF;   
      0100..06FF: Pass Through   
      0700..077F: Encodes 00..7F byte followed by 00.   
      0780..07FF: Encodes 0080..00FF.   
      0800..7FFF: Pass Through   
      8000..FFFF: Interpreted as a 2-byte pair (80..FF followed by 00..FF).   
   Some of this is an attempt to reduce the relative inefficiency of the   
   double-encoding scheme (the naive approach would effectively double the   
   encoded size of each codepoint, whereas this scheme as a worse case of   
   1.5x but on-average closer to 1x).   
      
   The above scheme might also slightly compact data expressed in string   
   literals if it happens to resemble these patterns (happens to match   
   UTF-8 byte sequences).   
      
   As noted, the ASCII byte followed by 00 is to try to avoid bloat for   
   string literals like "S\0o\0m\0e\0 \0S\0t\0r\0i\0n\0g\0\0" (sometimes   
   seen, most often in old code originally written for the Win32 API; in   
   the era when MS thought it was a good idea to move parts of the Win32   
   API over to UCS-2 / UTF-16 but not yet bothering to add UCS-2 string   
   literals to MSVC...).   
      
      
   For UTF-16 literals, it is basically M-UTF-8.   
      
   Note that non-BMP codepoints are:   
      Double encoded, for UTF-8 literals;   
      Encoded as surrogate pairs for UTF-16 (or UTF-32) literals.   
      
   Where, for the base-level encoding, values above 010000 may instead   
   potentially encode intra-string LZ matches (as a way to compactify large   
   string literals and text blobs). Though, this is optional and not   
   enabled ATM IIRC (not always 100% stable; and edge cases here may turn   
   large strings into confetti).   
      
      
   Though, for large numbers or similar encoded via strings, generally the   
   most space-efficient way ATM is Base85 or similar.   
      
   ...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca