... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,540 of 131,241
BGB to Stefan Monnier
Re: Register windows
30 Aug 25 01:47:03
   From: cr88192@gmail.com   
      
   On 8/29/2025 4:07 PM, Stefan Monnier wrote:   
   >> There is one additional, quite thorny issue:  How to maintain   
   >> state for nested functions to be invoked via pointers, which   
   >> have to have access local variables in the outer scope.   
   >> gcc does so by default by making the stack executable, but   
   >> that is problematic.  An alternative is to make some sort of   
   >> executable heap.  This is now becoming a real problem, see   
   >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117455 .   
   >   
   > AFAIK this is a problem only in those rare languages where a function   
   > value is expected to take up the same space as any other pointer while   
   > at the same time supporting nested functions.   
   >   
   > In most cases you have either one of the other but not both.  E.g. in   
   > C we don't have nested functions, and in Javascript functions are   
   > heap-allocated objects.   
   >   
   > Other than GNU C (with its support for nested functions), which other   
   > language has this weird combination of features?   
   >   
      
   FWIW, BGBCC has this (as both a C extension, and within my rarely-used   
   BS2 language).   
      
   But, yeah, in this case, the general idea is that lambdas consist of 2   
   or 3 parts:   
   The function body, located in ".text";   
   The data area holding the captured scope;   
   An executable "thunk", which loads the data pointer and transfers   
   control to the body (may be either RWX memory, or from a "pool" of   
   possible function pointers).   
      
      
   When implemented as RWX heap, the data area directly follows the thunk,   
   and both are located in special executable heap memory. An   
   automatic-only capture-by-reference form exists, but still uses heap   
   memory for this (but these heap allocations will be freed automatically).   
      
   So, the lambdas look the same as normal C function pointers in this way,   
   but creating new lambdas may leak memory if they are not freed.   
      
      
   There is another option which I have used sometimes which doesn't   
   require RWX memory, but which may technically abuse the C ABI:   
   Create a pool of functions with a more generic argument list, and then   
   allocate lambdas from the pool. Each function in the pool pulls its   
   data-area pointer from an array, with each function in the pool having a   
   corresponding array index (with a set upper limit to the maximum number   
   of lambdas).   
      
   Though, arguably, if the number of "live lambdas" is large, or the   
   lambdas are never freed, arguably there is a problem with the program   
   (and even if an implementation has a hard limit, say, of 256 or 1024   
   live lambda instances, this usually isn't too much of a problem).   
      
      
   This strategy works better for ABIs which pass every argument in   
   basically the same way (or can be made to look like such). If these   
   functions need to care about argument number or types (*), it becomes a   
   much harder problem.   
      
   *: Though, usually limited to a scheme like JVM-style I/L/F/D/A, as this   
   is sufficient, but "X*5^(1..n)" is still a much bigger number than X,   
   meaning 'n' (the maximum number of arguments) would need to be kept   
   small. This does not scale well...   
      
      
   For contrast, if one knows, for example, that in the ABI every relevant   
   argument is passed the same regardless of type (say, as a fixed 64-bit   
   element), and that any 128 bit arguments are passed as an even numbered   
   pair or similar (and we can always pretend as if we are passing the   
   maximum number of arguments). Things become simpler.   
      
   This later leaves the use of any executable as mostly optional, but   
   unlike the pool; executable memory has no set limit on the maximum   
   number of lambdas. There are tradeoffs either way.   
      
      
   Can note that on my ABI designs, the RISC-V LP64 ABI, and the Win64 ABI,   
   this property mostly holds. On the SysV AMD64 ABI, or RISC-V LP64D ABI,   
   it does not. Can note that BGBCC when targeting RV64 currently uses a   
   variant of the LP64 ABI.   
      
   For XG3, it may use either the LP64 ABI, or an experimental "XG3 Native"   
   ABI which differs slightly:   
      X10..X17 are used for arguments 1..8;   
      F10..F17 are used for arguments 9..16;   
      F4..F7 are reassigned to being callee-save.   
        Partly balancing out the register mix.   
          X: 15 scratch; 12 callee-save   
          F: 16 scratch; 16 callee-save.   
          So: 31 scratch, 28 callee-save.   
          Vs: 35 scratch, 24 callee-save.   
      Struct pass/return:   
        1-8 bytes: 1 register/spot;   
        9-16 bytes: 2 registers/spots, padded to an even index.   
        17+: pass/return via pointer.   
          For struct return, an implicit argument is passed;   
          Callee copies returned struct to the address passed by caller.   
      
      
   Though, another partial motivation for this sort of thing is to make it   
   simpler to marshal COM-style interfaces (it lessens the burden on the   
   lower levels to need to care about the method signatures for the   
   marshaled objects). Though, a higher level mechanism, such as an RPC   
   implementation, would still need to know about the method signatures.   
      
   ...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]