From: user5857@newsgrouper.org.invalid   
      
   Robert Finch posted:   
      
   > On 2025-10-29 2:33 p.m., MitchAlsup wrote:   
   > >   
   > > Robert Finch posted:   
   > >   
   > >> Started working on yet another CPU – Qupls4. Fixed 40-bit instructions,   
   > >> 64 GPRs. GPRs may be used in pairs for 128-bit ops. Registers are named   
   > >> as if there were 32 GPRs, A0 (arg 0 register is r1) and A0H (arg 0 high   
   > >> is r33). Sameo for other registers. GPRs may contain either integer or   
   > >> floating-point values.   
   > >>   
   > >> Going with a bit result vector in any GPR for compares, then a branch on   
   > >> bit-set/clear for conditional branches. Might also include branch true /   
   > >> false.   
   > >   
   > > I have both the bit-vector compare and branch, but also a compare to zero   
   > > and branch as a single instruction. I suggest you should too, if for no   
   > > other reason than:   
   > >   
   > > if( p && p->next )   
   > >   
   >   
   > Yes, I was going to have at least branch on register 0 (false) 1 (true)   
   > as there is encoding room to support it. It does add more cases in the   
   > branch eval, but is probably well worth it.   
   > >> Using operand routing for immediate constants and an operation size for   
   > >> the instruction. Constants and operation size may be specified   
   > >> independently. With 40-bit instruction words, constants may be 10,50,90   
   > >> or 130 bits.   
   > >   
   > > My 66000 allows for occasional use of 128-bit values but is designed mainly   
   > > for 64-bit and smaller.   
   > >   
   >   
   > Following the same philosophy. Expecting only some use for 128-bit   
   > floats. Integers can only handle 8,16,32, or 64-bits.   
   >   
   > > With 32-bit instructions, I provide, {5, 16, 32, and 64}-bit constants.   
   > >   
   > > Just last week we discovered a case where HW can do a better job than SW.   
   > > Previously, the compiler would emit:   
   > >   
   > > CVTfd Rt,Rf   
   > > FMUL Rt,Rt,#1.425D0   
   > > CVTdf Rd,Rt   
   > >   
   > > Which is subject to double rounding once at the FMUL and again at the   
   > > down conversion. I though about the problem and it seems fairly easy   
   > > to gate the 24-bit fraction into the multiplier tree along with the   
   > > 53-bit fraction of the constant, and then normalize and round the   
   > > result dropping out of the tree--avoiding the double rounding case.   
   > >   
   > > Now, the compiler emits:   
   > >   
   > > FMULf Rd,Rf,#1.425D0   
   > >   
   > > saving 2 instructions along with the higher precision.   
   >   
   > Improves the accuracy? of algorithms, but seems a bit specific to me.   
      
   It is down in the 1% footprint area.   
      
   > Are there other instruction sequence where double-rounding would be good   
   > to avoid?   
      
   Back when I joined Moto (1983) there was a lot of talk about double   
   roundings and how it could screw up various algorithms but mainly in   
   the 64-bit versus 80-bit stuff of 68881, where you got 11-more bits   
   of precision and thus took a change of 2/2^10 of a double rounding.   
   Today with 32-bit versus 64-bit you take a chance of 2/2^28 so the   
   problem is greatly ameliorated although technically still present.   
      
   The problem arises due to a cross products of various {machine,   
   language, compiler} features not working "all ends towards the middle".   
      
   LLVM promotes FP calculations with a constant to 64-bits whenever the   
   constant cannot be represented exactly in 32-bits. {Strike one}   
      
   C makes no statements about precision of calculation control.   
   {strike two}   
      
   HW almost never provides mixed mode calculations which provide the   
   means to avoid the double rounding. {strike three}   
      
   So, technically, My 66000 does not provide general-mixed-mode FP,   
   but I wrote a special rule to allow for larger constants used with   
   narrower registers to cover exactly this case. {It also saves 2 CVT   
   instructions (latency and footprint).   
      
   > Seems like HW could detect the sequence and fuse the instructions.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|