... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,313 of 131,241
Robert Finch to MitchAlsup
Re: Multi-precision addition and archite
17 Nov 25 16:58:31
   From: robfi680@gmail.com   
      
   On 2025-11-17 1:45 p.m., MitchAlsup wrote:   
   >   
   > Robert Finch  posted:   
   >   
   >> On 2025-11-17 3:33 a.m., Anton Ertl wrote:   
   >>> Robert Finch  writes:   
   >>>> Finding it too difficult to support 128-bit operations using high, low   
   >>>> register pairs. Getting the reservation stations to pair up the   
   >>>> registers seems a bit scary. It would be much simpler to just have   
   >>>> 128-bit registers and it appears as if it may not be any more logic.   
   >>>   
   >>> If you want to support 128-bit operations, using 128-bit registers   
   >>> certainly is the way to go.  Note how AMD used to split 128-bit SSE   
   >>> operations into 64-bit parts on 64-bit registers in the K8, split   
   >>> 256-bit AVX operations into 128-bit parts on 128-bit registers in Zen,   
   >>> but they went away from that: In Zen4 512-bit operations are performed   
   >>> in 256-bit-pieces, but the registers are 512 bits wide.   
   >>>   
   >>> However, the point of carry bits or Mitch Alsup's CARRY is not 128-bit   
   >>> operations, but multi-precision, which can be 256-bit for some crypto,   
   >>> 4096 bits for other crypto, or billions of bits for the stuff that   
   >>> Alexander Yee is doing.   
   >>>   
   >>>> Sparc v9 died?   
   >>>   
   >>> Oracle has discontinued SPARC development in 2017, Fujitsu has   
   >>> announced in 2016 that they switch to ARM A64.  Both Oracle and   
   >>> Fujitsu released their last new SPARC CPU in 2017.  Fujitsu has   
   >>> released the ARM A64-based A64FX in 2019.  The Leon4 (2017 according   
   >>> to ) and Leon5   
   >>> (2019) implement SPARC v8, not v9.   
   >>>   
   >>> The MCST-R2000 (2018) implements SPARC v9, but will it have a   
   >>> successor?  And even if it has a successor, will it be available in   
   >>> relevant numbers?  MCST is not married to SPARC, despite their name;   
   >>> they have worked on Elbrus 2000 implementations as well; Elbrus 2000   
   >>> supports Elbrus VLIW and "Intel x86" instruction sets, and new models   
   >>> were released in 2018, 2021, and 2025, so MCST now seems to focus on   
   >>> that.   
   >>>   
   >>> - anton   
   >>   
   >> Skimming through the SPARC architecture manual I am wondering how they   
   >> handle register renaming with a windowed register file. If the register   
   >> window file is deep there must be a ginormous number of registers for   
   >> renaming. Would it need to keep track of the renames for all the   
   >> registers? How does it dump the rename state to memory?   
   >>   
   >> Tried to find some information on Elbrus. I got page not found a couple   
   >> of times. Other than it’s a VLIW machine I do not know much about it.   
   >>   
   >> *****   
   >>   
   >> I would like a machine able to process 128-bit values directly, but it   
   >> takes up too many resources. It is easier to make the register file deep   
   >> as opposed to wide. BRAM has a max 64-bit width. After that it takes   
   >> more BRAMs to get a wider port. I tried a 128-bit wide register file,   
   >> but it used about 200 BRAMs. Too many.   
   >>   
   >> There are now 128 logical registers available in Qupls. It turns out   
   >> that the BRAM setup is 512 registers deep no matter whether there are   
   >> 32,64 or 128 registers. So, may as well make them available.   
   >   
   > Can you read BRAM 2× or 4× per CPU cycle ?!?   
      
   The BRAM and logic is not fast enough. There is also some logic to   
   select BRAM outputs via a live value table.   
      
   >   
   >> Qupls reservation stations were set up with support for eight operands   
   >> (four each for each ½ 128-bit register). The resulting logic was about   
   >> 25,000 LUTs for just one RS. This is compared to about 5,000 LUTs when   
   >> there were just four operands. What gets implemented is considerably   
   >> less as most functional units do not need all the operands.   
   >   
   > Ok, you found one way NOT to DO IT.   
   >   
   >> It may be resource efficient to use multiple reservation stations as   
   >> opposed to more operands in a single station. But then the operands need   
   >> to be linked together between stations. It may be possible using a hash   
   >> of the PC value and ROB entry number.   
   >   
   > Allow me to dissuade you from this.   
   >   
   Whew! After several tries I think I found a much better way of doing   
   things. The 128-bit op instructions are simply translated into two (or   
   more) 64-bit op micro-ops at the micro-op translation stage. There is no   
   messing around with reservation stations or operands then. But the   
   performance is potentially cut in half. For a much smaller   
   implementation it is worth it. Micro-op translation is only a few   
   hundred LUTs.   
      
   >> Qupls seems to have an implementation four or five times the size of the   
   >> FPGA again. Back to the drawing board.   
   >   
   > Live within your means.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]