home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,483 of 131,241   
   BGB to MitchAlsup   
   Re: What I did on my summer vacation   
   23 Aug 25 23:43:43   
   
   From: cr88192@gmail.com   
      
   On 8/23/2025 5:44 PM, MitchAlsup wrote:   
   >   
   > BGB  posted:   
   >   
   >> On 8/23/2025 10:11 AM, Terje Mathisen wrote:   
   >>> BGB wrote:   
   > -------------   
   >>>   
   >>> Mitch and I have repeated this too many times already:   
   >>>   
   >>> If you are implementing a current-standards FPU, including FMAC support,   
   >>> then you already have the very wide normalizer which is the only   
   >>> expensive item needed to allow zero-cycle denorm cost.   
   >>>   
   >>   
   >> Errm, no single-rounded FMA in my case, as single rounded FMA (for   
   >> Binary64) would also require Trap-and-Emulate...   
   >>   
   >> But, yeah, Free if you have FMA, is not the same as FMA being free.   
   >>   
   >> Partial issue is that single rounded FMA would effectively itself have   
   >> too high of cost (and an FMA unit would require higher latency than   
   >> separate FMUL and FADD units).   
   >   
   > FMA latency < (FMUL + FADD) latency   
   > FMA latency >= FMUL latency   
   > FMA latency >= FADD latency   
   >   
      
   Yeah.   
      
   As noted, as-is I have an FMUL unit and FADD unit.   
      
   The FMAC op in this case basically pipes the FMUL output through FADD,   
   at roughly FMUL+FADD latency. It then stalls the pipeline for twice as long.   
      
   But, doesn't give single rounding, nor could affordably be made to do so.   
      
      
   >> Ironically, what FMA operations exist tend to be slower for Binary32 ops   
   >> than using separate MUL and ADD ops in the default (non-IEEE) mode.   
   >> Though for Binary64, it would be slightly faster, though still   
   >> double-rounded-ish. They can mimic Single-Rounded behavior with Binary32   
   >> and Binary16 though mostly for sake of internally operating on Binary64.   
   >   
   > You must accept that::   
   >   
   >       FMA   Rd,Rs1,Rs2,Rs3   
   >       FSUB  Re,Rd,Rs3   
   >   
   > leaves all the proper bits in Re; whereas you cannot even argue::   
   >   
   >      FMUL   Rd,Rs1,Rs2   
   >      FADD   Re,Rd,Rs3   
   >      RSUB   Re,Re,R3   
   >   
   > leaves all the proper bits in Re !! in all cases !!   
      
   Granted...   
      
   But, normal C works fine without FMA; and if the ISA doesn't provide it,   
   then the FPU doesn't need to deal with it.   
      
   Nominally, C rules tend to assume that every operator operates   
   individually, so X*Y+Z turning into an FMA is by no means a required   
   behavior in C (and doing so may actually itself introduce unexpected   
   results).   
      
      
   And, if one does:   
      w = fma(x, y, z);   
      
   One can also implement "fma()" in a way that doesn't depend on having   
   native ISA level support.   
      
   Though, if supporting RISC-V, there is an issue:   
      RISC-V does have these instructions...   
      
      
   But, there are two possibilities here:   
      
   Double-rounded result; but this may violate IEEE semantics, and some   
   programs may depend on the assumption of it being single-rounded.   
      
   Or, trap and emulate: Little real hardware cost, but instruction now   
   takes roughly 500 or so clock cycles.   
      
      
   Though, was recently at least working on having full IEEE-754 semantics   
   as a possibility in my CPU core; though... mostly via trapping and   
   similar (sorta like the original MIPS FPUs or similar).   
      
      
   But, as can be noted:   
   Seemingly even high-end PC style FPUs are not immune.   
      
   Like, if even Intel and AMD can't fully avoid FPU performance issues due   
   to things like denormals and similar; it seems like everything is   
   basically doomed.   
      
   It almost seems more like to me:   
      IEEE-754 aimed too high;   
      And, now, everyone is paying for it.   
      
   Nevermind if "maybe we should all just do math slightly worse" is kind   
   of a weak sounding argument.   
      
      
   Sometimes it is also kind of lame that fixed point also kinda sucks, but   
   for different reasons...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca