... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,298 of 131,241
Anton Ertl to MitchAlsup
Re: Multi-precision addition and archite
16 Nov 25 08:22:52
   From: anton@mips.complang.tuwien.ac.at   
      
   MitchAlsup  writes:   
   >   
   >anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >> A common set of flags is NZCV.  Of these N and Z can be generated from   
   >> the 64 ordinary bits (actually N is the MSB of these bits).   
   >>   
   >> You might also want NCZV of 32-bit instructions, but in that case all   
   >> flags are derivable from the 64 ordinary bits of the GPR; but in that   
   >> case you may need additional branch instructions: Instructions that   
   >> check only if the bottom 32-bits are 0 (Z), if bit 31 is 1 (N), if bit   
   >> 32 is 1 (C), or if bit 32 is different from bit 31 (V).   
   >   
   >If you write an architectural rule whereby every integer result is   
   >"proper" one set of bits {top, bottom, dispersed} covers everything.   
   >   
   >Proper means that all the bits in the register are written but the   
   >value written is range limited to {Sign}×{Size} of the calculation.   
      
   I have no idea what you mean with "one set of bits {top, bottom,   
   dispersed}".   
      
   As for "proper": Does this mean that one would have to have add(c),   
   sub(c), mul (madd etc.), shift right and shift left (did I forget   
   anything?) for i8, i16, i32, i64, u8, u16, u32, and u64?  Yes, if   
   specify in the operation which kind of Z, C/V, and maybe N you are   
   interested in, you do not need to specify it in the branch that checks   
   that result; you also eliminate the sign-extension and zero-extension   
   operations that we discussed some time ago.   
      
   But given that the operations are much more frequent than branches,   
   encoding that information in the branches uses less space (for shift   
   right, the sign is usually included in the operation).  It's   
   interesting that AFAIK there are instruction sets (e.g., Power) that   
   just have one full-width sign-agnostic add, and do not have   
   width-specific flags, either.  So when compiling stuff like   
      
   if (a[1]+a[2] == 0) /* unsigned a[] */   
      
   a width-specific compare instruction provides that information.  But   
   gcc generates a compare instruction even when a[] is "unsigned long",   
   so apparently add does not set the flags on addition anyway (and if   
   there is an add that sets flags, it is not used by gcc for this code).   
      
   Another case is SPARC v9, which tends to set flags.  For   
      
     if ((a[1]^a[2]) < 0)   
      
   I see:   
      
   long a[]                      int a[]   
   ldx  [ %i0 + 8 ], %g1         ld  [ %i0 + 4 ], %g2   
   ldx  [ %i0 + 0x10 ], %g2      ld  [ %i0 + 8 ], %g1   
   xor  %g1, %g2, %g1            xorcc  %g2, %g1, %g0   
   brlz,pn   %g1, 24   bl,a,pn   %icc, 20    
      
   Reading up on SPARC v9, it has two sets of condition codes: 32-bit   
   (icc) and 64-bit (xcc), and every instruction that sets condition   
   codes (e.g., xorcc) sets both.  In the present case, the 32-bit   
   sequence sets the ccs and then checks icc, while the 64-bit sequence   
   does not set the ccs, and instead uses a branch instruction that   
   inspects an integer register (%g1).  These branch instructions all   
   work for the full 64 bits, and do not provide a way to check a 32-bit   
   result.  In the present case, an alternate way to use brlz for the   
   32-bit case would have been:   
      
   ldsw  [ %i0 + 8 ], %g1       #ld is a synonym for lduw   
   ldsw  [ %i0 + 0x10 ], %g2   
   xor  %g1, %g2, %g1   
   brlz,pn   %g1, 24    
      
   because the xor of two sign-extended data is also a correct   
   sign-extended result, but instread gcc chose to use xorcc and bl %icc.   
      
   There are many ways to skin this cat.   
      
   >> Concerning saving the extra bits across interrupts, yes, this has to   
   >> be adapted to the actual architecture, and there are many ways to skin   
   >> this cat.  I just outlined one to give an idea how this can be done.   
   >   
   >On the other hand, with CARRY, none of those bits are needed.   
      
   But the mechanism of CARRY is quite a bit more involved: Either store   
   the carry in a GPR at every step, or have another mechanism inside a   
   CARRY block.  And either make the CARRY block atomic or have some way   
   to preserve the fact that there is this prefix across interrupts and   
   (worse) synchronous traps.   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
     Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]