... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,322 of 131,241
Robert Finch to BGB
Re: Multi-precision addition and archite
18 Nov 25 20:26:25
   From: robfi680@gmail.com   
      
   On 2025-11-18 2:15 p.m., BGB wrote:   
   > On 11/17/2025 1:49 AM, Robert Finch wrote:   
   >> On 2025-11-16 1:36 p.m., MitchAlsup wrote:   
   >>>   
   >>> anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >>>   
   >>>> ERROR "unexpected byte sequence starting at index 853: '\xC3'" while   
   >>>> decoding:   
   >>>>   
   >>>> MitchAlsup  writes:   
   >>>>>   
   >>>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >>>>>> A common set of flags is NZCV.  Of these N and Z can be generated   
   >>>>>> from   
   >>>>>> the 64 ordinary bits (actually N is the MSB of these bits).   
   >>>>>>   
   >>>>>> You might also want NCZV of 32-bit instructions, but in that case all   
   >>>>>> flags are derivable from the 64 ordinary bits of the GPR; but in that   
   >>>>>> case you may need additional branch instructions: Instructions that   
   >>>>>> check only if the bottom 32-bits are 0 (Z), if bit 31 is 1 (N), if   
   >>>>>> bit   
   >>>>>> 32 is 1 (C), or if bit 32 is different from bit 31 (V).   
   >>>>>   
   >>>>> If you write an architectural rule whereby every integer result is   
   >>>>> "proper" one set of bits {top, bottom, dispersed} covers everything.   
   >>>>>   
   >>>>> Proper means that all the bits in the register are written but the   
   >>>>> value written is range limited to {Sign}Ã{Size} of the calculation.   
   >>>>   
   >>>> I have no idea what you mean with "one set of bits {top, bottom,   
   >>>> dispersed}".   
   >>>   
   >>> typedef struct { uint64_t reg;   
   >>>                   uint8_t  bits: 4; } gpr;   
   >>> or   
   >>> typedef struct { uint8_t  bits: 4;   
   >>>                   uint64_t reg;} gpr;   
   >>> or   
   >>> typedef struct { uint16_t reg0;   
   >>>                   uint8_t  bit0: 1;   
   >>>                   uint16_t reg1;   
   >>>                   uint8_t  bit1: 1;   
   >>>                   uint16_t reg2;   
   >>>                   uint8_t  bit2: 1;   
   >>>                   uint16_t reg3;   
   >>>                   uint8_t  bit3: 1;  } gpr;   
   >>>   
   >>> Did you loose every brain-cell of imagination ?!?   
   >>>   
   >>>> As for "proper": Does this mean that one would have to have add(c),   
   >>>> sub(c), mul (madd etc.), shift right and shift left (did I forget   
   >>>> anything?) for i8, i16, i32, i64, u8, u16, u32, and u64?  Yes, if   
   >>>> specify in the operation which kind of Z, C/V, and maybe N you are   
   >>>> interested in, you do not need to specify it in the branch that checks   
   >>>> that result; you also eliminate the sign-extension and zero-extension   
   >>>> operations that we discussed some time ago.   
   >>>   
   >>> {s8, s16, s32, s64, u8, u16, u32, u64} yes.   
   >>>> But given that the operations are much more frequent than branches,   
   >>>> encoding that information in the branches uses less space (for shift   
   >>>> right, the sign is usually included in the operation).  It's   
   >>>   
   >>> Which is why I don't have ANY of those extra bits.   
   >>>   
   >>>> interesting that AFAIK there are instruction sets (e.g., Power) that   
   >>>> just have one full-width sign-agnostic add, and do not have   
   >>>> width-specific flags, either.  So when compiling stuff like   
   >>>>   
   >>>> if (a[1]+a[2] == 0) /* unsigned a[] */   
   >>>>   
   >>>> a width-specific compare instruction provides that information.  But   
   >>>> gcc generates a compare instruction even when a[] is "unsigned long",   
   >>>> so apparently add does not set the flags on addition anyway (and if   
   >>>> there is an add that sets flags, it is not used by gcc for this code).   
   >>>>   
   >>>> Another case is SPARC v9, which tends to set flags.  For   
   >>>>   
   >>>>    if ((a[1]^a[2]) < 0)   
   >>>>   
   >>>> I see:   
   >>>>   
   >>>> long a[]                      int a[]   
   >>>> ldx  [ %i0 + 8 ], %g1         ld  [ %i0 + 4 ], %g2   
   >>>> ldx  [ %i0 + 0x10 ], %g2      ld  [ %i0 + 8 ], %g1   
   >>>> xor  %g1, %g2, %g1            xorcc  %g2, %g1, %g0   
   >>>> brlz,pn   %g1, 24   bl,a,pn   %icc, 20    
   >>>>   
   >>>> Reading up on SPARC v9, it has two sets of condition codes: 32-bit   
   >>>> (icc) and 64-bit (xcc), and every instruction that sets condition   
   >>>> codes (e.g., xorcc) sets both.   
   >>>   
   >>> Another reason its death is helpful to comp.arch   
   >>>   
   >>>>                                  In the   
   present case, the 32-bit   
   >>>> sequence sets the ccs and then checks icc, while the 64-bit sequence   
   >>>> does not set the ccs, and instead uses a branch instruction that   
   >>>> inspects an integer register (%g1).  These branch instructions all   
   >>>> work for the full 64 bits, and do not provide a way to check a 32-bit   
   >>>> result.  In the present case, an alternate way to use brlz for the   
   >>>> 32-bit case would have been:   
   >>>>   
   >>>> ldsw  [ %i0 + 8 ], %g1       #ld is a synonym for lduw   
   >>>> ldsw  [ %i0 + 0x10 ], %g2   
   >>>> xor  %g1, %g2, %g1   
   >>>> brlz,pn   %g1, 24    
   >>>>   
   >>>> because the xor of two sign-extended data is also a correct   
   >>>> sign-extended result, but instread gcc chose to use xorcc and bl %icc.   
   >>>>   
   >>>> There are many ways to skin this cat.   
   >>>   
   >>> Sure:: close to 20-ways, less than 4 of them are "proper".   
   >>>>>> Concerning saving the extra bits across interrupts, yes, this has to   
   >>>>>> be adapted to the actual architecture, and there are many ways to   
   >>>>>> skin   
   >>>>>> this cat.  I just outlined one to give an idea how this can be done.   
   >>>>>   
   >>>>> On the other hand, with CARRY, none of those bits are needed.   
   >>>>   
   >>>> But the mechanism of CARRY is quite a bit more involved: Either store   
   >>>> the carry in a GPR at every step, or have another mechanism inside a   
   >>>> CARRY block.  And either make the CARRY block atomic or have some way   
   >>>> to preserve the fact that there is this prefix across interrupts and   
   >>>> (worse) synchronous traps.   
   >>>   
   >>> During its "life" the bits used in CARRY are simply another feedback   
   >>> path on the data-path. Afterwards, carry is written once. CARRY also   
   >>> gets written when an exception is taken.   
   >>>   
   >>>>   
   >>>> - anton   
   >>   
   >> These posts have inspired me to keep working on the ISA. I am on a   
   >> simplification mission.   
   >>   
   >> The CARRY modifier is just a substitute for not having r3w2 port   
   >> instructions directly in the ISA. Since Qupls ISA has room to support   
   >> some r3w2 instructions directly there is no need for CARRY, much as I   
   >> like the idea.   
   >>   
   >> While not using a carry flag in the register, there is still a   
   >> capabilities bit, overflow bit and pointer bit plus four user assigned   
   >> bits. I decided to just have 72-bit register store and load   
   >> instructions along with the usual 8,16,32 and 64.   
   >>   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]