... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,723 of 131,241
EricP to Anton Ertl
Re: CISCs, uOps, and books (1/2)
19 Sep 25 17:48:52
   From: ThatWouldBeTelling@thevillage.com   
      
   Anton Ertl wrote:   
   > EricP  writes:   
   >> Anton Ertl wrote:   
   >>> Thomas Koenig  writes:   
   >>>> BGB  schrieb:   
   >>>>   
   >>>>> Still sometimes it seems like it is only a matter of time until Intel or   
   >>>>> AMD releases a new CPU that just sort of jettisons x86 entirely at the   
   >>>>> hardware level, but then pretends to still be an x86 chip by running   
   >>>>> *everything* in a firmware level emulator via dynamic translation.   
   >>>> For AMD, that has happend already a few decades ago; they translate   
   >>>> x86 code into RISC-like microops.   
   >>> That's nonsense; regulars of this groups should know better, at least   
   >>> this nonsense has been corrected often enough.  E.g., I wrote in   
   >>> <2015Dec6.152525@mips.complang.tuwien.ac.at>:   
   >>>   
   >>> |Not even if the microcode the Intel and AMD chips used was really   
   >>> |RISC-like, which it was not (IIRC the P6 uses micro-instructions with   
   >>> |around 100bits, and the K7 has a read-write Rop (with the "R" of "Rop"   
   >>> |standing for "RISC").   
   >> I don't know what you are objecting to   
   >   
   > I am objecting to the claim that uops are RISC-like, and that there is   
   > a translation to RISC occuring inside the CPU, and (not present here,   
   > but often also claimed) that therefore there is no longer a difference   
   > between RISC and non-RISC.   
      
   Ok. I disagree with this because I have a different view of the   
   changes in moving from CISC to RISC (which I'll describe below).   
      
   > One can discuss the details, but at the end of the day, uops are some   
   > implementation-specific internals of the microarchitecture, whereas a   
   > RISC architecture is an architecture.   
   >   
   >> The number of bits has nothing to do with what it is called.   
   >> If this uOp was for a ROB style design where all the knowledge about   
   >> each instruction including register ids, immediate data,   
   >> scheduling info, result data, status, is stored in a single ROB entry,   
   >> then 100 bits sounds pretty small so I'm guessing that was a 32-bit cpu.   
   >   
   > Yes, P6 is the code name for the Pentium Pro, which has a ROB, and,   
   > more importantly valued reservation stations, and yes, the 118 or   
   > whatever bits include the operands.  I have no idea how the P6 handles   
   > its 80-bit FP with valued RSs; maybe it has bigger uops in its FP part   
   > (but I think it has a unified scheduler, so that would not work out,   
   > or maybe I miss something).   
   >   
   > But concerning the discussion at hand: Containing the data is a   
   > significant deviation from RISC instruction sets, and RISC   
   > instructions are typically only 32 bits or 16 bits wide.   
      
   Yes, and those 32-bit external ISA instructions are mapped into uOps   
   internally. All that is different here is the difficulty for decode.   
      
   I see the difference between CISC and RISC as in the micro-architecture,   
   changing from a single sequential state machine view to multiple concurrent   
   machines view, and from Clocks Per Instruction to Instructions Per Clock.   
      
   The monolithic microcoded machine, which covers 360, 370, PDP-11, VAX,   
   386, 486 and Pentium, is like a single threaded program which   
   operates sequentially on a single global set of state variables.   
   While there is some variation and fuzziness around the edges,   
   the heart of each of these are single sequential execution engines.   
      
   An important consequence of the sequential design is that   
   most of this machine is sitting idle most of the time.   
      
   One can take an Alpha ISA and implement it with a microcoded sequencer   
   but that should not be called RISC so the distinction must lie elsewhere.   
      
   RISC changes that design to one like a multi-threaded program with   
   messages passing between them called uOps, where the dynamic state   
   of each instruction is mostly carried with the uOp message,   
   and each thread does something very simple and passes the uOp on.   
   Where global resources are required, they are temporarily dynamically   
   allocated to the uOp by the various threads, carried with the uOp,   
   and returned later when the uOp message is passed to the Retire thread.   
   The Retire thread is the only one which updates the visible global state.   
      
   As I see it, this Multiple Simple Thread Message Passing Architecture   
   (MST-MPA) is the essence of the change RISC invoked, and any   
   micro-architecture that follows it is in the risc design style.   
      
   The RISC design guidelines described by various papers, rather than   
   go/no-go decisions, are mostly engineering compromises for consideration   
   of things which would make an MST-MPA more expensive to implement or   
   otherwise interfere with maximizing the active concurrency of all threads.   
   Whether the register file has 8, 16, or 32 entries affects the frequency   
   of stalls but doesn't change whether it is implemented as MST-MPA and   
   therefore entitled to be called "RISC".   
      
   This is why I think it would have been possible to build a risc-style   
   PDP-11 in 1975 TTL, or a VAX if they had just left the instructions of   
   the same complexity as PDP-11 ISA (53 opcodes, max one immediate,   
   max one mem op per instruction).   
      
   >>> Another difference is that the OoO engine that sees the uOps performs   
   >>> only a very small part of the functionality of branches, with the   
   >>> majority performed by the front end.  I.e., there is no branching in   
   >>> the OoO engine that sees the uOps, at the most it confirms the branch   
   >>> prediction, or diagnoses a misprediction, at which point the OoO   
   >>> engine is out of a job and has to wait for the front end; possibly   
   >>> only the ROB (which deals with instructions again) resolves the   
   >>> misprediction and kicks the front end into action, however.   
   >> And a uOp triggers that action sequence.   
   >> I don't see the distinction you are trying to make.   
   >   
   > The major point is that the OoO engine (the part that deals with uops)   
   > sees a linear sequence of uops it has to process, with nearly all   
   > actual branch processing (which an architecture has to do) done in a   
   > part that does not deal with uops.  With the advent of uop caches that   
   > has changed a bit, but many of the CPUs for which the uop=RISC claim   
   > has been made do not have an uop cache.   
      
   There are multiple places that can generate next RIP addresses:   
   - The incremented RIP for the current instruction   
   - Branch Prediction can redirect Fetch   
   - Decode can pick off unconditional branches and immediately redirect Fetch.   
   - Decode also could notice if a the branch predictor made an erroneous   
      decision and redirect Fetch.   
   - Register Read might forward a "JMP reg" address directly to Fetch.   
   - The Branch Unit BRU has a uOp scheduler to wait for in-flight registers   
      or condition codes and then processes all branch & jump uOps and   
      possibly redirects Fetch, and update Branch Prediction.   
   - uOp Retire detects exceptions and can force a Fetch redirect.   
   - Interrupts can redirect Fetch.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]