home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,451 of 131,241   
   EricP to Anton Ertl   
   Re: VAX   
   20 Aug 25 16:41:39   
   
   From: ThatWouldBeTelling@thevillage.com   
      
   Anton Ertl wrote:   
   > EricP  writes:   
   >> There were a number of proposals around then, the paper I linked to   
   >> also suggested injecting the miss routine into the ROB.   
   >> My idea back then was a HW thread.   
   >>   
   >> All of these are attempts to fix inherent drawbacks and limitations   
   >> in the SW-miss approach, and all of them run counter to the only   
   >> advantage SW-miss had: its simplicity.   
   >   
   > Another advantage is the flexibility: you can implement any   
   > translation scheme you want: hierarchical page tables, inverted page   
   > tables, search trees, ....  However, given that hierarchical page   
   > tables have won, this is no longer an advantage anyone cares for.   
   >   
   >> The SW approach is inherently synchronous and serial -   
   >> it can only handle one TLB miss at a time, one PTE read at a time.   
   >   
   > On an OoO engine, I don't see that.  The table walker software is   
   > called in its special context and the instructions in the table walker   
   > are then run through the front end and the OoO engine.  Another table   
   > walk could be started at any time (even when the first table walk has   
   > not yet finished feeding its instructions to the front end), and once   
   > inside the OoO engine, the execution is OoO and concurrent anyway. It   
   > would be useful to avoid two searches for the same page at the same   
   > time, but hardware walkers have the same problem.   
      
   Hmmm... I don't think that is possible, or if it is then its really hairy.   
   The miss handler needs to LD the memory PTE's, which can happen OoO.   
   But it also needs to do things like writing control registers   
   (e.g. the TLB) or setting the Accessed or Dirty bits on the in-memory PTE,   
   things that usually only occur at retire. But those handler instructions   
   can't get to retire because the older instructions that triggered the   
   miss are stalled.   
      
   The miss handler needs general registers so it needs to   
   stash the current content someplace and it can't use memory.   
   Then add a nested miss handler on top of that.   
      
   >> While HW walkers are serial for translating one VA,   
   >> the translations are inherently concurrent provided one can   
   >> implement an atomic RMW for the Accessed and Modified bits.   
   >   
   > It's always a one-way street (towards accessed and towards modified,   
   > never the other direction), so it's not clear to me why one would want   
   > atomicity there.   
      
   As Scott said, to avoid race conditions with software clearing those bits.   
   Plus there might be PTE modifications that an OS could perform on other   
   PTE fields concurrently without first acquiring the normal mutexes   
   and doing a TLB shoot down of the PTE on all the other cores,   
   provided they are done atomically so the updates of one core   
   don't clobber the changes of another.   
      
   >> Each PTE read can cache miss and stall that walker.   
   >> As most OoO caches support multiple pending misses and hit-under-miss,   
   >> you can create as many HW walkers as you can afford.   
   >   
   > Which poses the question: is it cheaper to implement n table walkers,   
   > or to add some resources and mechanism that allows doing SW table   
   > walks until the OoO engine runs out of resources, and a recovery   
   > mechanism in that case.   
      
   A HW walker looks simple to me.   
   It has a few bits of state number and a couple of registers.   
   It needs to detect memory read errors if they occur and abort.   
   Otherwise it checks each TLB level in backwards order using the   
   appropriate VA bits, and if it gets a hit walks back down the tree   
   reading PTE's for each level and adding them to their level TLB,   
   checking it is marked present, and performing an atomic OR to set   
   the Accessed and Dirty flags if they are clear.   
      
   The HW walker is even simpler if the atomic OR is implemented directly   
   in the cache controller as part of the Atomic Fetch And OP series.   
      
   > I see other performance and conceptual disadvantages for the envisioned   
   > SW walkers, however:   
   >   
   > 1) The SW walker is inserted at the front end and there may be many   
   > ready instructions ahead of it before the instructions of the SW   
   > walker get their turn.  By contrast, a hardware walker sits in the   
   > load/store unit and can do its own loads and stores with priority over   
   > the program-level loads and stores.  However, it's not clear that   
   > giving priority to table walking is really a performance advantage.   
   >   
   > 2) Some decisions will have to be implemented as branches, resulting   
   > in branch misses, which cost time and lead to all kinds of complexity   
   > if you want to avoid resetting the whole pipeline (which is the normal   
   > reaction to a branch misprediction).   
   >   
   > 3) The reorder buffer processes instructions in architectural order.   
   > If the table walker's instructions get their sequence numbers from   
   > where they are inserted into the instruction stream, they will not   
   > retire until after the memory access that waits for the table walker   
   > is retired.  Deadlock!   
   >   
   > It may be possible to solve these problems (your idea of doing it with   
   > something like hardware threads may point in the right direction), but   
   > it's probably easier to stay with hardware walkers.   
   >   
   > - anton   
      
   Yes, and it seems to me that one would spend a lot more time trying to   
   fix the SW walker than doing the simple HW walker that just works.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca