From: anton@mips.complang.tuwien.ac.at   
      
   John Savard writes:   
   >On Thu, 18 Dec 2025 21:29:00 +0000, MitchAlsup wrote:   
   >> Or in other words, if you can decode K-instructions per cycle, you'd   
   >> better be able to execute K-instructions per cycle--or you have a   
   >> serious blockage in your pipeline.   
   >   
   >No.   
   >   
   >If you flipped "decode" and "execute" in that sentence above, I would 100%   
   >agree. And maybe this _is_ just a typo.   
   >   
   >But if you actually did mean that sentence exactly as written, I would   
   >disagree. This is why: I regard executing instructions as 'doing the   
   >actual work' and decoding instructions as... some unfortunate trivial   
   >overhead that can't be avoided.   
      
   It does not matter what "the actual work" is and what isn't. What   
   matters is how expensive it is to make a particular part wider, and   
   how paying that cost benefits the IPC. At every step you add width to   
   the part with the best benefit/cost ratio.   
      
   And looking at recent cores, we see that, e.g., Skymont can decode   
   3x3=9 instructions per cycle, rename 8 per cycle, has 26 ports to   
   functional units (i.e., can execute 26 uops in one cycle); I don't   
   know how many instructions it can retire per cycle, but I expect that   
   it is more than 8 per cycle.   
      
   So the renamer is the bottleneck, and that's also the idea behind   
   top-down microarchitecture analysis (TMA) for determining how software   
   interacts with the microarchitecture. That idea is coming out of   
   Intel, but if Intel is finding it hard to make wider renamers rather   
   than wider other parts, I expect that the rest of the industry also   
   finds that hard (especially for architectures where decoding is   
   cheaper), and (looking at ARM A64) where instructions with more   
   demands on the renamer exist.   
      
   Concerning the question what is doing "the actual work", it's   
   obviously committing the instruction in the ROB. Up to that point,   
   the instruction is speculative, only with the commit it becomes   
   architectural.   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
    Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|