From: david.brown@hesbynett.no   
      
   On 28/11/2025 14:33, Michael S wrote:   
   > On Fri, 28 Nov 2025 12:45:58 +0100   
   > David Brown wrote:   
   >   
   >>   
   >> I can believe that. If you have to implement floating point routines   
   >> in general integer hardware (and I expect that is the case for most   
   >> of your implementation here) then I would think it is better to start   
   >> and end with the data in GPR's. On some targets, moving data into   
   >> and out of floating point or vector registers is efficient enough   
   >> that those registers can effectively be used as caches, but it sounds   
   >> like that is not the case here.   
   >>   
   >   
   > On Windows the problem is only of moving data between various types of   
   > registers.   
   > On SysV things are worse: there is also a problem of absence of   
   > caller-saved FP/SIMD registers. In theory, the problem could have been   
   > solved by defining specialized ABI for support routines (__addtf3,   
   > __subtf3, __multf3, etc...), but that was not done either.   
   >   
   > I think, that it all comes from the old mental model of soft floating   
   > point routines being very slow; so slow that ABI impedance mismatches   
   > lost in noise. But in specific case of binary128 on modern CPUs, it's   
   > simply not true - arithmetic itself is quite fast so ABI mismatches are   
   > significant.   
   >   
      
   My only real experience with software floating point (using it, not   
   writing it) is on systems where they are either slow (like 32-bit   
   Cortex-M ARMs), or /very/ slow (like an 8-bit AVR). A little   
   inefficiency in the main ABI's is, as you say, just noise in these cases.   
      
   But in those systems, the floating point arithmetic routines were part   
   of the compiler support library. Functions there don't have to abide by   
   the platform ABI - they can use different registers according to what   
   suits best. Were you working on a library that integrates into the   
   compiler, or was it more "user level" (like a C++ "binary128" class with   
   operator overrides) ?   
      
   ABI's are obviously useful for standardisation and intermixing of code   
   from different tools. But they can also be a pain, especially when they   
   are old and outdated or designed to be efficient on different processors   
   or with different kinds of code. I am finding the EABI for 32-bit ARM   
   to be a serious performance drain for some kinds of work. It doesn't   
   support passing anything bigger than 32-bit in registers, except for   
   "long long int" and "unsigned long long int". It has the same   
   restriction on return values. That means if you have something like a   
   C++ optional type, or equivalent struct in C, it's all passed   
   back and forth on the stack. And unlike the AMD processors you mention,   
   on a Cortex-M core that is a lot slower!   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|