From: cr88192@gmail.com   
      
   On 10/10/2025 5:06 AM, David Brown wrote:   
   > On 10/10/2025 08:27, BGB wrote:   
   >> On 10/9/2025 10:59 PM, Keith Thompson wrote:   
   >>> bart writes:   
   >   
   >>>   
   >>>>> One merit is if code can be copy-pasted, but if one has to change   
   >>>>> all instances of:   
   >>>>> char *s0, *s1;   
   >>>>> To:   
   >>>>> char* s0, s1;   
   >>>>> Well, this is likely to get old, unless it still uses, or allows C   
   >>>>> style declaration syntax in this case.   
   >>>>   
   >>>> That one's been fixed (50 years late): you instead write:   
   >>>>   
   >>>> typeof(char*) s0, s1;   
   >>>>   
   >>>> But you will need an extension if it's not part of C23.   
   >>>   
   >>> Yes, that will work in C23, but it would never occur to me to   
   >>> write that. I'd just write `char *s0, *s1;` or, far more likely,   
   >>> define s0 and s1 on separate lines. Using typeof that way triggers   
   >>> my WTF filter.   
   >>>   
   >>   
   >> Agreed.   
   >>   
   >>   
   >>   
   >> I think it can be contrast with C# style syntax (with "unsafe") where   
   >> one would write:   
   >> char* s0, s1;   
   >   
   > Does C# treat s1 as "char*" in this case? That sounds like an   
   > extraordinarily bad design decision - having a syntax that is very like   
   > the dominant C syntax yet subtly different.   
   >   
      
   Yes. In this case, things like "*" or "[]" are associated with the type   
   rather than the declarator.   
      
      
   > Issues like this have been "solved" for decades - in the sense that   
   > people who care about their code don't make mistakes from mixups of   
   > "char" and "char*" declarations. There are a dozen different ways to be   
   > sure it is not an issue. Simplest of all is a style rule - never   
   > declare identifiers of different types in the same declaration. I'd   
   > have preferred that to be a rule baked into the language from the start,   
   > but we all have things we dislike about the C syntax.   
   >   
      
   The partial reason for some of the differences is that it allows a   
   parser that does not need to know about previous typedefs and declarations.   
      
   In C, you need to know prior typedefs to parse correctly.   
    In C++, you also need to know previous template declarations, etc.   
    With classes/structs/etc adding implicit typedefs.   
      
      
   Avoiding the need to know typedefs in advance allows for a parser where   
   there either is no preprocessor (Java), or the preprocessor still exists   
   but its use is far more limited in scope and mostly unused (C#).   
      
   Also typically, things like the type-system are handled later in the   
   pipeline (in .NET, it was closer to what would be considered the linker   
   stage in a traditional compiler).   
      
   In effect, the front-end process works with relatively incomplete   
   information, producing IL bytecode that specifies where to look for   
   things and what to look for, but not the complete information. When an   
   EXE or DLL is produced, it would resolve things for what exists within   
   the current "assembly" (roughly equal to the EXE or DLL being compiled),   
   with the ".NET runtime" needing to sort out the rest (typically AOT   
   compiling the binaries into some internal form).   
      
   However, I would assume not having a "runtime" here, meaning the linker   
   would need to produce native code binaries.   
      
      
   FWIW: BGBCC also generally uses a bytecode representation internally,   
   and then produces native binaries as output. Though, the way the   
   bytecode is structured and works differs from that of .NET bytecode.   
   However, in both cases, they are using implicitly-typed stack machines   
   at the IL stage. In BGBCC, for the backend stage, the bytecode IL is   
   translated into "Three-Address-Code" roughly in "SSA Form" (though, not   
   exactly the same as in LLVM; as it typically uses a combination of   
   variable-ID and sequence-number, rather than creating a new "register"   
   every time; also typically the "phi" operations are implicit).   
      
   Can note that it does support ASM, but the handling is generally that   
   any ASM code is preprocessed and then passed through the IL stage as   
   string blobs (then assembled in the backend stage).   
      
   Note, while it is possible to go more directly from a stack IL to native   
   code (without going through 3AC/SSA), the generated code is garbage.   
      
   Also, while it is possible to have a compiler that uses SSA as an   
   on-disk IR format (like Clang), IMO this creates a lot of pain and   
   exposes too much of the backend machinery (it would be very much a pain   
   to use LLVM bitcode in anything other than LLVM).   
      
   So, seemingly, a stack-oriented bytecode is the "least pain" option.   
   Well... Unless they do it like WASM and find other creative ways to   
   screw it up...   
      
      
      
   Can note that in the case of a language like C#, the visibility of types   
   and similar comes through the use of namespaces (which partly take on a   
   similar role to headers in C or C++, or packages in Java or ActionScript).   
      
   Where, say:   
    namespace foo { using bar.baz; } //C# style   
    namespace foo { using namespace bar::baz; } //C++ style   
    package foo { import bar.baz; } //ActionScript style   
      
   Though:   
    import bar.baz.*; //Java   
      
   But, Java differs here in that the code structure (and packages) are   
   directly tied to organization of files in the filesystem (typically with   
   one class per file).   
      
   Contrast, .NET and C# used "assemblies" as the organizing principle; or,   
   generally, everything that is being compiled together to become a given   
   EXE or DLL is lumped into a single unit.   
      
      
   Though, one option could be to organize code instead by namespace, with   
   the toplevel tied to each location in the search path.   
      
   Though, with such a compiler, rather than specifying a list of   
   individual source files, one might specify directories and the compiler   
   figures things out on its own (basically compiling everything in a given   
   directory).   
      
   One way to handle things like static libraries would be to build a blob   
   of intermediate bytecode (and/or native-code COFF or ELF objects) along   
   with a manifest database. The bytecode blob would contain all of the IR   
   for the library (or machine-code if native), and the manifest would only   
   contain declarations (preferably in a semi-compact form that is   
   reasonably efficient to search). The manifests could then partly be used   
   for knowing about declarations, and also for which objects or libraries   
   to pull into the program being compiled (rather than giving them   
   individually on the command-line).   
      
      
   This approach would differ from .NET which embeds all of the metadata   
   into the object-files and distributable binaries. But, here I am   
   assuming that the final binary is a bare native EXE or DLL image here;   
   meaning that any manifest data for a DLL would need to be handled more   
   like an "import library".   
      
   In .NET, generally the EXE or DLL was merely being used as an external   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|