From: antispam@fricas.org   
      
   Bart wrote:   
   > On 07/02/2026 22:48, Waldek Hebisch wrote:   
   >> Bart wrote:   
   >>> On 07/02/2026 18:07, Kaz Kylheku wrote:   
   >   
   >>>> The : syntax in the deffi macro call indicates the variadic list.   
   >>>> It's not the case that we can make a variadic Lisp function pass its   
   arguments   
   >>>> as an arbitrarily long variadic list with arbitrary types to the wrapped   
   FFI   
   >>>> function. Fixed parameters must be declared after the colon.   
   >>>   
   >>> There's another issue with calling variadic functions, unrelated to the   
   >>> number of args. I can't tell from the above whether it's convered.   
   >>>   
   >>> Normally an arg that is passed in a register, will be passed in GPR for   
   >>> integer, or a floating point register if not.   
   >>>   
   >>> But a variadic float argument has to be passed in both, so for Win64   
   >>> ABI/x64 it might be in both rcx and xmm1. I think it is similar on SYS V   
   >>> for both x64 and arm64 (maybe on the latter both are passed in the GPR;   
   >>> I'd have to go and look it up).   
   >>   
   >> In SYS V convention argument is passed in exactly one place. It may   
   >> be GPR, may be XMM register, may be on the stack. If you put right   
   >> thing in RAX, then your arguments are valid regardless if the function   
   >> is a vararg function or not.   
   >   
   > I had to go and check this, and you're right. SYS V does nothing special   
   > when calling variadic functions.   
      
   Well, there is special thing: RAX should contain number of SSE   
   registers used for passing parameters. You do not need to set   
   RAX for normal calls (at least on Linux, some other systems   
   require it for all calls).   
      
   > I guess that makes implementing the body of variadic functions harder,   
   > since it doesn't know where to look for the n'th variadic argument   
   > unless it knows the type.   
      
   Well, if a function wants to do actual computation with an argument   
   it should better know its type. So, this affects only "intermediate"   
   functions that want to repack/shuffle arguments before passing them   
   to some other function. That happens, but is reasonably rare.   
   In one such case I just decided that intermediate function will   
   work only for arguments passed in integer registers (this covered   
   the actual use case). Note that regardless of types there is still   
   problem of number of arguments. In my case the intermediate function   
   just grabs 20 arguments and passes them further. Of course this   
   is undefined behaviour in C, but in practice the function just   
   gets garbage for non-existing arguments. The final recipient knows   
   how many arguments were really passed and ignores extra garbage.   
      
   > And even then, because the int and non-int args are spilled to separate   
   > blocks, it has to keep track of where the next arg is in which block.   
   >   
   > I think MS made the better call here; the necessary code is trivial for   
   > Win64 ABI.   
   >   
   >>>> A dynamic treatment could be arranged via a heavy weight wrapper   
   mechanism which   
   >>>> dynamically analyzes the actual arguments, builds a libffi function   
   descriptor   
   >>>> on the fly, then uses it to make the call; it could be wortwhile for   
   someone,   
   >>>> but I didn't implement such a thing. Metaprogramming tricks revolving   
   around   
   >>>> dynamically evaluating deffi are also possible.   
   >>>   
   >>> My LIBFFI approach just uses assembly; it's the simplest way to do it.   
   >>> (The LIBFFI 'C' library also uses assembly to do the tricky bits.)   
   >>>   
   >>> There, for Win64 ABI, I found it easiest to just load all the register   
   >>> args to both integer and float registers, whether the called function   
   >>> was variadic or not. That's far more efficient than figuring out the   
   >>> right register argument by argument.   
   >>>   
   >>> I haven't implemented that for SYS V; that's more of a nightmare ABI   
   >>> where up to 6-12 args   
   >   
   > (Actually, 6-14 args; 6 max in GPRs and 8 in xmm regs)   
   >   
   >> (8-16 on aarch64) can be passed between int and   
   >>> float registers depending on the mix of types.   
   >>>   
   >>> On Win64 ABI, it is 4 args, always.   
   >>   
   >> My code works fine for SYS V on amd64 and arm32. I do not think FFI   
   >> for aarch64 will be any harder, but ATM I do not have code generator   
   >> for aarch64, no need for FFI there.   
   >>   
   >> I did not bother with Windows, since I do not use it it would be   
   >> untested and hence buggy code anyway.   
   >   
   > I started generating code for ARM64, but gave up because it was too hard   
   > and not fun (the RISC processor turned out to be a LOT more complex than   
   > the CISC x64!).   
      
   Well, RISC processor means that compiler have to do work which is   
   frequently done by hardware on a CISC. Concerning arm32, most   
   annoying for me was limited range of constants, especially limit   
   on offsets that can be part of an instruction. With my current   
   implementation that puts something like 2kB limit on size of local   
   variables. And my generator mixes instructions and constant data   
   (otherwise it could not access constant data using limited available   
   offsets), which works but compilcates code generator and probably   
   gives suboptimal performance.   
      
   > The last straw was precisely to do with the SYS V call-conventions, and   
   > I hadn't even gotten to variadic arguments yet, nor to structs passed   
   > by-value, where the rules are labyrinthine.   
      
   My low-level code only handles scalar arguments. That includes pointer   
   to structures, but not structures passed by value. Structures passed by   
   value could be handled by higher-level code, but up to now there was   
   no need to do this.   
      
   BTW, my amd64 code is assembler, so off-topic here, but arm32 code   
   is mostly C. I use two helper structures:   
      
   struct registers_buffer {   
    int i_reg[4];   
    union {double d; struct {float sl; float sh;} sf2;} f_reg[8];   
   };   
      
   typedef struct registers_buffer reg_buff;   
      
   typedef struct arg_state { int ni; int sfi; int dfi; int si;} arg_state;   
      
   C code fills 'reg_buff' with values and later low-level assembly   
   copies values from the buffer to registers. I allocate enough space on   
   the stack so that C code can write to the stack without risk of   
   stack overflow.   
      
   There are 3 helper routines:   
      
   static void   
   store_single(arg_state * as, void * sp, reg_buff * rp, float val) {   
    float * dst;   
    int sfi = as->sfi;   
    if (sfi < 16) {   
    int dfi1 = sfi >> 1;   
    if (sfi & 1) {   
    dst = &(rp->f_reg[dfi1].sf2.sh);   
    as->sfi = (as->dfi)<<1;   
    } else {   
    dst = &(rp->f_reg[dfi1].sf2.sl);   
    as->sfi++;   
    as->dfi = (as->dfi > dfi1)?as->dfi:(dfi1 + 1);   
    }   
    } else {   
    dst = (float *)sp + as->si;   
    as->si++;   
    }   
    memcpy(dst, &val, sizeof(val));   
   }   
      
   static void   
   store_double(arg_state * as, void * sp, reg_buff * rp, double val) {   
    int dfi = as->dfi;   
    double * dst;   
    if (dfi < 8) {   
    dst = &(rp->f_reg[dfi].d);   
    as->dfi++;   
    if (as->sfi == (dfi<<1)) {   
    as->sfi = (as->dfi<<1);   
    }   
    } else {   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|