From: antispam@fricas.org   
      
   Bart wrote:   
   > On 08/02/2026 19:21, Waldek Hebisch wrote:   
   >> Bart wrote:   
   >>> On 07/02/2026 22:48, Waldek Hebisch wrote:   
   >   
   >>>> In SYS V convention argument is passed in exactly one place. It may   
   >>>> be GPR, may be XMM register, may be on the stack. If you put right   
   >>>> thing in RAX, then your arguments are valid regardless if the function   
   >>>> is a vararg function or not.   
   >>>   
   >>> I had to go and check this, and you're right. SYS V does nothing special   
   >>> when calling variadic functions.   
   >>   
   >> Well, there is special thing: RAX should contain number of SSE   
   >> registers used for passing parameters. You do not need to set   
   >> RAX for normal calls (at least on Linux, some other systems   
   >> require it for all calls).   
   >   
   > I looked out for that but don't remember seeing in on godbolt.org, and I   
   > think it was for SYS V.   
   >   
   > But I tried it again, and AL is being set to some count, which appears   
   > to be the total number of float arguments (and rereading your comment,   
   > you say the same thing).   
   >   
   >>   
   >>> I guess that makes implementing the body of variadic functions harder,   
   >>> since it doesn't know where to look for the n'th variadic argument   
   >>> unless it knows the type.   
   >>   
   >> Well, if a function wants to do actual computation with an argument   
   >> it should better know its type.   
   >   
   > On Windows, it will know the location of the next vararg and can access   
   > its value before it knows the type. The user-provided type (eg.   
   > 'var_arg(p, int)') can simple do a type-punning cast on the value.   
   >   
   > All args: fixed, variadic-reg, variadic-pushed, will also all be in   
   > consecutive stack slots, regardless of type (This is the real reason why   
   > floats should be loaded to GPRs for variadics: entry code just needs to   
   > spill those 4 GPRs, it anyway won't know the mix of types.)   
   >   
   >>> I started generating code for ARM64, but gave up because it was too hard   
   >>> and not fun (the RISC processor turned out to be a LOT more complex than   
   >>> the CISC x64!).   
   >>   
   >> Well, RISC processor means that compiler have to do work which is   
   >> frequently done by hardware on a CISC. Concerning arm32, most   
   >> annoying for me was limited range of constants, especially limit   
   >> on offsets that can be part of an instruction. With my current   
   >> implementation that puts something like 2kB limit on size of local   
   >> variables. And my generator mixes instructions and constant data   
   >> (otherwise it could not access constant data using limited available   
   >> offsets), which works but compilcates code generator and probably   
   >> gives suboptimal performance.   
   >   
   > There are a dozen annoying things like this on arm64. Even when you give   
   > up and decide to load 64-bit constants from a memory pool, you find you   
   > can't even directly access that pool as it has an absolute address. That   
   > can involve first loading the page address (ie. minus lower 12 bits) to   
   > R, then you have to use an address mode involving R and the lower 12   
   > bits as an offset.   
      
   As I wrote I generate constant pool as part of instruction stream.   
   I use PC-relative adressing so as long as constant is close   
   enough to instruction using it I can use short offsets.   
   There is some extra effort, normally I am trying to put constants   
   after unconditional jump and before next label, but I may need   
   extra jump to "jump around" constants.   
      
   >>> The last straw was precisely to do with the SYS V call-conventions, and   
   >>> I hadn't even gotten to variadic arguments yet, nor to structs passed   
   >>> by-value, where the rules are labyrinthine.   
   >>   
   >> My low-level code only handles scalar arguments. That includes pointer   
   >> to structures, but not structures passed by value. Structures passed by   
   >> value could be handled by higher-level code, but up to now there was   
   >> no need to do this.   
   >>   
   >> BTW, my amd64 code is assembler, so off-topic here, but arm32 code   
   >> is mostly C. I use two helper structures:   
   >>   
   >> struct registers_buffer {   
   >> int i_reg[4];   
   >> union {double d; struct {float sl; float sh;} sf2;} f_reg[8];   
   >> };   
   >>   
   >> typedef struct registers_buffer reg_buff;   
   >>   
   >> typedef struct arg_state { int ni; int sfi; int dfi; int si;} arg_state;   
   >>   
   >> C code fills 'reg_buff' with values and later low-level assembly   
   >> copies values from the buffer to registers. I allocate enough space on   
   >> the stack so that C code can write to the stack without risk of   
   >> stack overflow.   
   >>   
   >> There are 3 helper routines:   
   >   
   >   
   > This looks pretty complicated, but what is it for: is it still to do   
   > with variadic functions, or is to with the LIBFFI problem?   
      
   This is to handle a call, it does not matter variadic or not.   
   The call is from dynamically typed language and argument types are   
   known only at runtime (actually, argument types _may_ be statically   
   known at higher level, but for simplicity "all" (see below) calls go   
   trough a single low-level routine that handles general dynamic case.   
   Dispatcher routine (which I did not show) loops over arguments,   
   decodes their types and converts them to C representation. Then   
   it calls one of the 3 helper routines to place each argument in the buffer   
   or on the stack.   
      
   There is simpler integer only code which is mainly used to perform   
   low level system calls. This iterface do not convert arguments   
   (the assumption is that caller passes C-compatible representation)   
   and logic is simpler as it just puts what fits in registers and   
   the rest on the stack.   
      
   Both interfaces spill all registers used by calling language to   
   the stack before actual processing of the call. This is because   
   C code may perform a callback and callback may trigger garbage   
   collection and garbage collector needs to see all registers that   
   may point to language data. There are global variables which   
   tell garbage collector which parts of the stack are managed by   
   the language (and need to be scanned) and which belong to C or   
   FFI machinery (garbage collector ignores this part).   
      
   --   
    Waldek Hebisch   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|