... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.arch

Apparently more than just beeps & boops

131,241 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 129,647 of 131,241

EricP to Thomas Koenig

Re: A new method for OoO

11 Sep 25 13:22:44

   From: ThatWouldBeTelling@thevillage.com   

   Thomas Koenig wrote:   
   > https://old.chipsandcheese.com/2025/08/29/condors-cuzco-risc-v   
   core-at-hot-chips-2025/   
   > has an interestig take on how to do OoO (quite patented,   
   > apparently).  Apparently, they predict how many cycles their   
   > instructions are going to take, and replay if that doesn't work   
   > (for example in case of an L1 cache miss).   
   >   
   > Sounds interesting, I wonder what people here think of it.   

   I searched for "processor" "schedule" "time resource matrix" and got   
   a hit on a different company's patent for what looks like the same idea.   

   Time-resource matrix for a microprocessor with time counter   
   for statically dispatching instructions   
   https://patents.google.com/patent/US11829762B2   

   It basically puts all the schedule in one HW matrix of time_slots * resources   
   and scans forward looking for empty slots to allocate to each instruction.   
   The scheduling is done at Rename and time slots assigned for each resource   
   needed, source operand read ports, FU's, result buses.   
   If a load later misses L1 it triggers a replay of all younger instructions.   

   They claim it is simpler but I question that.   
   Putting all the schedule info in one matrix means that to scale it   
   requires adding more ports to the matrix. Also different resources   
   can require different allocation and scheduling algorithms.   
   Doing all this in one place at the same time gets complicated quickly.   

   My simulated design intentionally distributed schedulers to each FU's bank   
   of reservation stations so they all schedule concurrently and each scheduler   
   algorithm is optimized for its FU.   

   Also a wake-up matrix is not that complicated. I used the write of the   
   destination Physical Register Number (PRN) as the wake-up signal.   
   Each PRN has a wire the runs to all RS and each operand waiting for   
   that PRN watches that wire for a pulse indicating the write result value   
   will be forwarded in the next cycle on a dynamically assigned result bus.   
   The RS operand can either save a copy of the value or launch execution   
   immediately if all resources are available.   

   My design appears to be similar to issue logic for   
   RISC-V Berkeley Out-of-Order Machine (BOOM). As they note, schedulers   
   are simple and different kinds can be used for different FU.   
   My ALU used simple round-robin whereas Branch Unit BRU is age ordered.   
   This is simple to do as each scheduler only looks at its own RS bank.   
   https://docs.boom-core.org/en/latest/sections/issue-units.html   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]