From: user5857@newsgrouper.org.invalid   
      
   EricP posted:   
      
   > Robert Finch wrote:   
   > > I hard-coded an IRQ delay down-count in the Qupls4 core. The down-count   
   > > delays accepting interrupts for ten clock cycles or about 40   
   > > instructions if an interrupt got deferred. The interrupt being deferred   
   > > because interrupts got disabled by an instruction in the pipeline. I   
   > > guessed 40 instructions would likely be enough for many cases where IRQs   
   > > are disabled then enabled again.   
   > >   
   > > The issue is the pipeline is full of ISR instructions that should not be   
   > > committed because the IRQs got disabled in the meantime. If the CPU were   
   > > allowed to accept another IRQ right away, it could get stuck in a loop   
   > > flushing the pipeline and reloading with the ISR routine code instead of   
   > > progressing through the code where IRQs were disabled.   
   > >   
   > > I could create a control register for this count and allow it to be   
   > > programmable. But I think that may not be necessary.   
   > >   
   > > It is possible that 40 instructions is not enough. In that case the CPU   
   > > would advance in 40 instruction burps. Alternating between fetching ISR   
   > > instructions and the desired instruction stream. On the other hand, a   
   > > larger down-count starts to impact the IRQ latency.   
   > >   
   > > Tradeoffs…   
   > >   
   > > I suppose I could have the CPU increase the down-count if it is looping   
   > > around fetching ISR instructions. The down-count would be reset to the   
   > > minimum again once an interrupt enable instruction is executed.   
   > >   
   > > Complex…   
   > >   
   >   
   > You are using this timer to predict the delay for draining the pipeline.   
   > It would only take a read of a slow IO device register to exceed it.   
      
   Yes, exactly::   
      
   Consider a GBOoO processor that performs a LD R9,[deviceCR].   
      
   a) all earlier memory references have to be seen globally   
   ...before this LD can be seen globally. {dozens of cycles}   
   b) this LD has to arrive at HostBridge. {dozens of cycles}   
   c) HostBrdge sends request down PCIe {hundreds of cycles}   
   d) device responds to LD {handful of cycles}   
   e) PCIe transports response to HB {hundreds of cycles}   
   f) HB transfers response to requestor {dozens of cycles}   
   g) CPU is allowed to re-enter OoO {handful of cycles}   
      
   Accesses to devices need to have most of the properties of   
   "Sequential Consistency" as defined by Lamport.   
      
   Now, several LDs [DeviceCRs] can be seen globally and in order   
   before the first (or all responses) but you are going to see all   
   that latency in the pipeline; but OoO memory requests are not one   
   of them.   
      
   > I was thinking a simple and cheap way would be to use a variation of the   
   > single-step mechanism. An interrupt request would cause Decode to emit a   
   > special uOp with the single-step flag set and then stall, to allow the   
   > pipeline to drain the old stream before accepting the interrupt and   
   > redirecting Fetch to its handler. That way if there are and interrupt   
   > enable or disable instructions, or branch mispredicts, or pending exceptions   
   > in-flight they all are allowed to finish and the state to settle down.   
   >   
   > Pipelining interrupt delivery looks possible but gets complicated and   
   > expensive real quick.   
   >   
   >   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|