... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.arch

Apparently more than just beeps & boops

131,241 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 129,605 of 131,241

MitchAlsup to Or making a

Re: Concedtina III May Be Returning

06 Sep 25 16:29:36

   From: user5857@newsgrouper.org.invalid   
      
   anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
      
   > scott@slp53.sl.home (Scott Lurndal) writes:   
   > >An interesting note in the aforementioned analysis is why   
   > >the call instruction was so expensive in time - the 780 cache   
   > >was write-through, so the multiple stores would be limited   
   > >to DRAM speeds.   
   >   
   > But do you need fewer stores if you use simpler instructions?  Did the   
   > C compiler that used BSR etc. to implement a call store less?  How so?   
   >   
   > Also, the DRAM speed is three cycles.   
      
   Is this a serial 3-cycles::   
      
        | route | DRAM  | route |   
                                | route | DRAM  | route |   
      
   or an pipelineable 3-cycles::   
      
        | route | DRAM  | route |   
                | route | DRAM  | route |   
                        | route | DRAM  | route |   
      
   It makes a big difference.   
      
   >                                        CALL/RET took an average 45   
   > cycles.   
      
   15-registers × 3-cycles   
      
   >          RET does not store.  So if most of the cost is storing and   
   > loading, and, say, each instruction has 10 cycles overhead (which   
   > would already be a lot), that's 90 cycles for a call and a ret, and 70   
   > cycles of that for n stores and n loads.  With stores taking 3 cycles   
   > and loads taking 1 (the stack stuff is usually in the cache),   
   > n=17.5. But VAX has only 16 registers (including PC), and not every   
   > one of them is saved on every call.  So there were additional   
   > overheads.   
      
   It seems to me that pipelining of DRAM would have dramatically helped.   
   Or making a write-back cache would have also helped immensely.   
      
   > With good support for making full use of the cache read bandwidth, the   
   > loading part could be sped up to two loads per cycle.  But I expect   
   > that the VAX 11/780 did not do that.   
   >   
   > - anton   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]