... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.arch

Apparently more than just beeps & boops

131,241 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 129,528 of 131,241

EricP to EricP

Re: VAX

28 Aug 25 13:39:54

   From: ThatWouldBeTelling@thevillage.com   
      
   EricP wrote:   
   > Thomas Koenig wrote:   
   >> MitchAlsup  schrieb:   
   >>   
   >>> One must remember that VAX was a 5-cycle per instruction machine !!!   
   >>> (200ns : 1 MIP)   
   >>   
   >> 10.5 on a characteristic mix, actually.   
   >>   
   >> See "A Characterization of Processor Performance in the VAX-11/780"   
   >> by Emer and Clark, their Table 8.   
   >   
   > Going through the VAX 780 hardware schematics and various performance   
   > papers, near as I can tell it took *at least* 1 clock per instruction byte   
   > for decode, plus any I&D cache miss and execute time, as it appears to   
   > use microcode to pull bytes from the 8-byte instruction buffer (IB)   
   > *one at a time*.   
   >   
   > So far I have not found any parallel pathway that could pull a multi-byte   
   > immediate operand from the IB in 1 clock.   
   >   
   > And I say "at least" 1 C/IB as I am not including any micro-pipeline   
   > stalls.   
   > The microsequencer has some pipelining, overlap read of the next uWord   
   > with execute of current, which would introduce a branch delay slot into   
   > the microcode. As it uses the opcode and operand bytes to do N-way   
   > jump/call   
   > to uSubroutines, each of those dispatches might have a branch delay slot   
   > too.   
   >   
   > (Similar issues appear in the MV-8000 uSequencer except it appears to   
   > have 2 or maybe 3 microcode branch delay slots).   
      
   I found a description of the 780 instruction buffer parser   
   in the Data Path description on bitsavers and   
   it does in fact pull one operand specifier from IB per clock.   
   There is a mux network to handle various immediate formats in parallel,   
      
   There are conflicting descriptions as to exactly how it handles the   
   first operand, whether that is pulled with the opcode or in a separate clock,   
   as the IB shifter can only do 1 to 5 byte shifts but an opcode with   
   a first operand with 32-bit displacement would be 6 bytes.   
      
   But basically it takes 1 clock for the opcode byte and the first operand   
   specifier byte, a second clock if the first opspec has an immediate,   
   then 1 clock for each subsequent operand specifier.   
   If an operand has an immediate it is extracted in parallel with its opspec.   
      
   If that is correct a MOV rs,rd or ADD rs,rd would take 2 clocks to decode,   
   and a MOV offset(rs),rd would take 3 clocks to decode.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]