From: already5chosen@yahoo.com   
      
   On Fri, 28 Nov 2025 12:45:58 +0100   
   David Brown wrote:   
      
   >   
   > I can believe that. If you have to implement floating point routines   
   > in general integer hardware (and I expect that is the case for most   
   > of your implementation here) then I would think it is better to start   
   > and end with the data in GPR's. On some targets, moving data into   
   > and out of floating point or vector registers is efficient enough   
   > that those registers can effectively be used as caches, but it sounds   
   > like that is not the case here.   
   >   
      
   On Windows the problem is only of moving data between various types of   
   registers.   
   On SysV things are worse: there is also a problem of absence of   
   caller-saved FP/SIMD registers. In theory, the problem could have been   
   solved by defining specialized ABI for support routines (__addtf3,   
   __subtf3, __multf3, etc...), but that was not done either.   
      
   I think, that it all comes from the old mental model of soft floating   
   point routines being very slow; so slow that ABI impedance mismatches   
   lost in noise. But in specific case of binary128 on modern CPUs, it's   
   simply not true - arithmetic itself is quite fast so ABI mismatches are   
   significant.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|