From: user5857@newsgrouper.org.invalid   
      
   EricP posted:   
      
   > MitchAlsup wrote:   
   > > EricP posted:   
   > >   
   > >> John Savard wrote:   
   > >>> On Sun, 21 Dec 2025 20:32:44 +0000, MitchAlsup wrote:   
   > >>>>> On Thu, 18 Dec 2025 21:29:00 +0000, MitchAlsup wrote:   
   > >>>>>> Or in other words, if you can decode K-instructions per cycle, you'd   
   > >>>>>> better be able to execute K-instructions per cycle--or you have a   
   > >>>>>> serious blockage in your pipeline.   
   > >>>> Not a typo--the part of the pipeline which is narrowest is   
   > >>>> the part that limits performance. I suggest strongly that you should not   
   > >>>> make/allow the decoder to play that part.   
   > >>> I agree - and strongly, too - that the decoder ought not to be the part   
   > >>> that limits performance.   
   > >>>   
   > >>> But what I quoted says that the execution unit ought not to be the part   
   > >>> that limits performance, with the implication that it's OK if the decoder   
   > >>> does instead. That's why I said it must be a typo.   
   > >>>   
   > >>> So I think you need to look a second time at what you wrote; it's natural   
   > >>> for people to see what they expect to see, and so I think you looked at   
   > >>> it, and didn't see the typo that was there.   
   > >>>   
   > >>> John Savard   
   > >> There are two kinds of stalls:   
   > >> stalls in the serial front end I-cache, Fetch or Decode stages because   
   > >> of *too little work* (starvation due to input latency),   
   > >> and stalls in the back end Execute or Writeback stages because   
   > >> of *too much work* (resource exhaustion).   
   > >   
   > > DECODE latency increases when:   
   > > a) there is no instruction(s) to decode   
   > > b) there is no address from which to fetch   
   > > c) when there is no translation of the fetch address   
   > >   
   > > a) is a cache miss   
   > > b) is an indirect control transfer   
   > > c) is a TLB miss   
   > >   
   > > And there may be additional cases of instruction buffer hiccups.   
   >   
   > Yes. Also Decode generated stalls - pipeline drain.   
   > Rename stall for new dest register pool exhaustion.   
   >   
   > >> The front end stalls inject bubbles into the pipeline,   
   > >> whereas back end stalls can allow younger bubbles to be compressed out.   
   > >   
   > > How In-Order your thinking is. GBOoO machine do not inject bubbles.   
   >   
   > You get bubbles if you overload their resources no matter how GB it is.   
   >   
   > For example, if all the reservation stations for a FU are in use then   
   > Dispatch has to stall, which stalls the whole front end.   
      
   These are not "bubbles"   
   These are "window Full" stalls   
      
   > A compacting pipeline in the front end can compress out those bubbles   
   > but it eventually stalls too.   
      
   DECODE still has no place to put the instructions.   
      
   > Dependency stalls - all the uOps in reservation stations are waiting   
   > on other results. Serialization stalls.   
      
   Latency stalls or you can call then RAW stalls.   
      
   > If a design is doing dynamic register file read port assignment and   
   > runs out of read ports. Resource exhaustion stalls.   
      
   Yes, that is why I don't like reorder buffers so much.   
   Renamers are an issue as you generally have more rename port   
   requirements than Read requirements--BECAUSE DECODE has wider   
   BW than the execution machinery.   
      
   > Multiple uOps are ready but only one can launch. Scheduling stalls.   
   >   
   > >> If I have to stall, I want it in the back end.   
   > >   
   > > If I have to stall I want it based on "realized" latency.   
   > >   
   > >> It has to do with catching up after a stall.   
   > >   
   > > Which is why you do not inject bubbles...   
   >   
   > It's not me doing it. I blame the speed of light.   
      
   It seems our verbology is not aligned.   
      
   > >> If a core stalls for 3 clocks, then in order to average 1 IPC   
   > >> it must retire 2 instructions per clock for the next 3 clocks.   
   > >> And it can only do that if it has a backlog of work ready to execute.   
   >   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|