... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,303 of 131,241
Robert Finch to MitchAlsup
Re: Multi-precision addition and archite
17 Nov 25 02:49:15
   From: robfi680@gmail.com   
      
   On 2025-11-16 1:36 p.m., MitchAlsup wrote:   
   >   
   > anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >   
   >> ERROR "unexpected byte sequence starting at index 853: '\xC3'" while   
   decoding:   
   >>   
   >> MitchAlsup  writes:   
   >>>   
   >>> anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >>>> A common set of flags is NZCV.  Of these N and Z can be generated from   
   >>>> the 64 ordinary bits (actually N is the MSB of these bits).   
   >>>>   
   >>>> You might also want NCZV of 32-bit instructions, but in that case all   
   >>>> flags are derivable from the 64 ordinary bits of the GPR; but in that   
   >>>> case you may need additional branch instructions: Instructions that   
   >>>> check only if the bottom 32-bits are 0 (Z), if bit 31 is 1 (N), if bit   
   >>>> 32 is 1 (C), or if bit 32 is different from bit 31 (V).   
   >>>   
   >>> If you write an architectural rule whereby every integer result is   
   >>> "proper" one set of bits {top, bottom, dispersed} covers everything.   
   >>>   
   >>> Proper means that all the bits in the register are written but the   
   >>> value written is range limited to {Sign}Ã{Size} of the calculation.   
   >>   
   >> I have no idea what you mean with "one set of bits {top, bottom,   
   >> dispersed}".   
   >   
   > typedef struct { uint64_t reg;   
   >                   uint8_t  bits: 4; } gpr;   
   > or   
   > typedef struct { uint8_t  bits: 4;   
   >                   uint64_t reg;} gpr;   
   > or   
   > typedef struct { uint16_t reg0;   
   >                   uint8_t  bit0: 1;   
   >                   uint16_t reg1;   
   >                   uint8_t  bit1: 1;   
   >                   uint16_t reg2;   
   >                   uint8_t  bit2: 1;   
   >                   uint16_t reg3;   
   >                   uint8_t  bit3: 1;  } gpr;   
   >   
   > Did you loose every brain-cell of imagination ?!?   
   >   
   >> As for "proper": Does this mean that one would have to have add(c),   
   >> sub(c), mul (madd etc.), shift right and shift left (did I forget   
   >> anything?) for i8, i16, i32, i64, u8, u16, u32, and u64?  Yes, if   
   >> specify in the operation which kind of Z, C/V, and maybe N you are   
   >> interested in, you do not need to specify it in the branch that checks   
   >> that result; you also eliminate the sign-extension and zero-extension   
   >> operations that we discussed some time ago.   
   >   
   > {s8, s16, s32, s64, u8, u16, u32, u64} yes.   
   >   
   >> But given that the operations are much more frequent than branches,   
   >> encoding that information in the branches uses less space (for shift   
   >> right, the sign is usually included in the operation).  It's   
   >   
   > Which is why I don't have ANY of those extra bits.   
   >   
   >> interesting that AFAIK there are instruction sets (e.g., Power) that   
   >> just have one full-width sign-agnostic add, and do not have   
   >> width-specific flags, either.  So when compiling stuff like   
   >>   
   >> if (a[1]+a[2] == 0) /* unsigned a[] */   
   >>   
   >> a width-specific compare instruction provides that information.  But   
   >> gcc generates a compare instruction even when a[] is "unsigned long",   
   >> so apparently add does not set the flags on addition anyway (and if   
   >> there is an add that sets flags, it is not used by gcc for this code).   
   >>   
   >> Another case is SPARC v9, which tends to set flags.  For   
   >>   
   >>    if ((a[1]^a[2]) < 0)   
   >>   
   >> I see:   
   >>   
   >> long a[]                      int a[]   
   >> ldx  [ %i0 + 8 ], %g1         ld  [ %i0 + 4 ], %g2   
   >> ldx  [ %i0 + 0x10 ], %g2      ld  [ %i0 + 8 ], %g1   
   >> xor  %g1, %g2, %g1            xorcc  %g2, %g1, %g0   
   >> brlz,pn   %g1, 24   bl,a,pn   %icc, 20    
   >>   
   >> Reading up on SPARC v9, it has two sets of condition codes: 32-bit   
   >> (icc) and 64-bit (xcc), and every instruction that sets condition   
   >> codes (e.g., xorcc) sets both.   
   >   
   > Another reason its death is helpful to comp.arch   
   >   
   >>                                  In the present case, the 32-bit   
   >> sequence sets the ccs and then checks icc, while the 64-bit sequence   
   >> does not set the ccs, and instead uses a branch instruction that   
   >> inspects an integer register (%g1).  These branch instructions all   
   >> work for the full 64 bits, and do not provide a way to check a 32-bit   
   >> result.  In the present case, an alternate way to use brlz for the   
   >> 32-bit case would have been:   
   >>   
   >> ldsw  [ %i0 + 8 ], %g1       #ld is a synonym for lduw   
   >> ldsw  [ %i0 + 0x10 ], %g2   
   >> xor  %g1, %g2, %g1   
   >> brlz,pn   %g1, 24    
   >>   
   >> because the xor of two sign-extended data is also a correct   
   >> sign-extended result, but instread gcc chose to use xorcc and bl %icc.   
   >>   
   >> There are many ways to skin this cat.   
   >   
   > Sure:: close to 20-ways, less than 4 of them are "proper".   
   >   
   >>>> Concerning saving the extra bits across interrupts, yes, this has to   
   >>>> be adapted to the actual architecture, and there are many ways to skin   
   >>>> this cat.  I just outlined one to give an idea how this can be done.   
   >>>   
   >>> On the other hand, with CARRY, none of those bits are needed.   
   >>   
   >> But the mechanism of CARRY is quite a bit more involved: Either store   
   >> the carry in a GPR at every step, or have another mechanism inside a   
   >> CARRY block.  And either make the CARRY block atomic or have some way   
   >> to preserve the fact that there is this prefix across interrupts and   
   >> (worse) synchronous traps.   
   >   
   > During its "life" the bits used in CARRY are simply another feedback   
   > path on the data-path. Afterwards, carry is written once. CARRY also   
   > gets written when an exception is taken.   
   >   
   >>   
   >> - anton   
      
   These posts have inspired me to keep working on the ISA. I am on a   
   simplification mission.   
      
   The CARRY modifier is just a substitute for not having r3w2 port   
   instructions directly in the ISA. Since Qupls ISA has room to support   
   some r3w2 instructions directly there is no need for CARRY, much as I   
   like the idea.   
      
   While not using a carry flag in the register, there is still a   
   capabilities bit, overflow bit and pointer bit plus four user assigned   
   bits. I decided to just have 72-bit register store and load instructions   
   along with the usual 8,16,32 and 64.   
      
   Finding it too difficult to support 128-bit operations using high, low   
   register pairs. Getting the reservation stations to pair up the   
   registers seems a bit scary. It would be much simpler to just have   
   128-bit registers and it appears as if it may not be any more logic. The   
   benefit of using register pairs is the internal busses need only be   
   64-bits then.   
      
   Sparc v9 died?   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]