... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.c
Meh, in C you gotta define EVERYTHING
243,242 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 241,314 of 243,242
BGB to bart
Re: Nice way of allocating flexible stru
15 Oct 25 13:00:19
   From: cr88192@gmail.com   
      
   On 10/15/2025 5:26 AM, bart wrote:   
   > On 15/10/2025 02:13, BGB wrote:   
   >   
   >> Apparently the languages people are trying to push as C replacements   
   >> are mostly Rust, Zig, and Go.   
   >>   
   >> None of these particularly compel me though.   
   >>    They seem more like needless deviations from C than a true successor.   
   >   
   > So what would a true successor look like?   
   >   
      
   Probably sorta like C with a few vaguely C++ like features, but with a   
   cleaner and simpler design.   
      
   Should ideally be usable for similar stuff to C.   
      Not drastically or needlessly different.   
      
   Looking around, it seems like the CMU C0 and C1 teaching languages also   
   seem in the general area design-wise, though they exist more as limited   
   C-like subset languages intended more for introductory programming for   
   CS courses.   
      
   Could make sense to have some C++ style functionality, but with an aim   
   of not going down the rabbit hole of adding excessive implementation   
   complexity.   
      
      
   >>   
   >>   
   >> I guess the older generations mostly had Pascal and Ada.   
   >>   
   >> There was ALGOL, but both C and Pascal descended from ALGOL.   
   >   
   > I've heard that before that C was somehow derived from Algol and even   
   > Algol 68.   
   >   
   > But it is so utterly unlike either of those, that if it's from the same   
   > family, then it must have been adopted.   
   >   
      
   Idea is that it went ALGOL -> BCPL -> B -> C.   
      Going the other way, ALGOL was derived from FORTRAN.   
      
   ALGOL was also the ancestor of Pascal and Ada, so there was a bit of   
   mutation there,   
      
      
   >   
   >> As noted elsewhere, my thinking is partly that pipeline looks like:   
   >>    Preprocessor (basic or optional, C like)   
   >>    Parser (Context-independent, generates ASTs)   
   >>    Front end compiler: Compiles ASTs to a stack IL.   
   >   
   >> Backend:   
   >>    IL -> 3AC/SSA;   
   >   
   > That's odd: you're going from a stack IL to a 3AC non-stack IR/IL?   
   >   
   > Why not go straight to 3AC?   
   >   
   > (I've tried both stack and 3AC ILs, but not both in the same compiler! I   
   > finally decided to stay with stack; 3AC code *always* got too fiddly to   
   > deal with.   
   >   
      
   Well, the downside of 3AC (as an IL) is that it tends to be fiddly and   
   often is much more specific to the design choices of the frontend and   
   backend that produced it.   
      
   Also, going from a Stack IL to 3AC is fairly easy, and generally less of   
   a mess than dealing with a 3AC IL here. Also with 3AC one has to decide   
   on things like whether or not it is in SSA form, as SSA vs non-SSA   
   follow different rules.   
      
      
   Downside is that a stack IL is often further from the code you "actually   
   want to generate" than a 3AC IL would have been (and to generate more   
   efficient 3AC you may need to generate less-concise stack code, such as   
   my having the frontend manually use temporary variables, partly negating   
   some of the conceptual benefits of a stack IR, but alas).   
      
   But, on the positive side, the stack manipulations/etc map readily to   
   SSA form.   
      
      
   A stack IL that makes sense for a compiler might look like:   
      Stack ops for each major operator;   
      No explicit types in most instructions.   
        Type can be carried along the stack.   
        The .NET IL also did this.   
      Control flow is via labels and conditional branches.   
      Typically no items on the stack during a branch.   
      May make sense to combine common stack-ops with storing to a variable.   
        Say: "ADD; STORE n" => "ADD_ST n"   
        Rationale being that this is less work for the backend.   
      Types can be identified by signature strings.   
      
      
   Granted, one can note that a stack IL typically needs around 70% more   
   operations than you would need for a 3AC, but most of these operations   
   will disappear in the conversion process.   
      
   one semi-unresolved design issue is whether it is better to have a   
   single unified numbering space for local variables, like in the JVM and   
   similar, or several different numbering spaces (arguments, locals, and   
   temporary variables). In my ILs, I have often ended up going for the latter.   
      
   Say, for example, you can encode the "name"/"symbol" for Load/Store/Etc   
   as a VLN, say:   
      0xxxxxxx: 0..127   
      10xxxxxx xxxxxxxx 128..16383   
      110xxxxx ...: 16384..2M   
      ...   
   And then use a tagging scheme to encode variable IDs, say:   
      ...xxxx00  Local   
      ...xxxx10  Temporary   
      ...xxx001  Argument   
      ...xxx101  Int32 Literal   
      ...xx0011  Global Variable   
      ...xx1011  String Literal   
      
   Where Locals and Temporaries are given the shortest code as these are   
   more common and preferably have shorter (single byte) encodings when   
   possible (so, for example, the first 32 local variables can be single   
   byte, etc).   
      
   For integer literals, one can additionally use a zigzag coding   
   (0,-1,1,-2,2, ...). String literals can be encoded as an offset into a   
   string table.   
      
   for something like a typecast operator, you might encode an offset into   
   a string table for a type-signature string.   
      
   ...   
      
   Well, sorta, the IL used in BGBCC isn't quite so clean.   
   It instead encodes strings and symbols inline, and uses a sliding table   
   to refer back to them when they repeat. This also works, but is more   
   ugly than encoding IDs and using a string table might have been.   
      
   But, string tables make more sense for an externally-structured format.   
      
      
   Ironically, came up with a possible format for manifest files (loosely   
   WAD based) that could also make sense as an IL packaging format.   
      
   Ended up going back and forth between having it be WAD2 or WAD4 based,   
   instead ended up with a compromise of supporting mixed 32 and 64 byte   
   entries. Would have a tree structure similar to WAD4, but with the   
   downside that for the 32-byte entries names are reduced to 10 bytes (vs   
   32 bytes for the 64-byte entries; or 16 bytes in the original WAD2 format).   
      
   But, can debate whether or not this would make sense in a   
   space-efficiency sense. The design is more focused on semi-efficient   
   random access rather than compactness (whereas typically bytecode IL   
   packaging is more focused on being compact).   
      
   Though, compactness may not matter as much for things like object-files   
   which are less likely to be used to actually distribute code.   
      
      
   Though, one merit is that it could more easily allow for a compiler that   
   decodes stack-IR into 3AC one function at a time, or demand-loads parts   
   of the image, rather than needing to load everything for the whole   
   program in advance (and burning a lot of RAM this way).   
      
   Annoyingly, even a simple format like IWAD would still end up needing 16   
   bytes per entry.   
      
   But, it can offer more flexibility (and not needing an additional   
   mechanism to look things up by QName), say, if compared with a format   
   like RIFF (which has an 8-byte minimum overhead per lump). Well, and the   
   scheme as-is, allows lumps with <= 12 bytes of payload to encode it   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]