From: user5857@newsgrouper.org.invalid   
      
   BGB posted:   
      
   > On 2/10/2026 3:56 PM, MitchAlsup wrote:   
   > >   
   > > BGB posted:   
   > >   
   > >> On 2/10/2026 1:04 PM, MitchAlsup wrote:   
   > >>>   
   > >>> Robert Finch posted:   
   > >>>   
   > >>>> On 2026-02-09 5:27 p.m., BGB wrote:   
   > >>>>> On 2/9/2026 2:44 PM, EricP wrote:   
   > >>> ------------------------------------   
   > >>>>   
   > >>>> I found a slide presentation on the idea.   
   > >>>>   
   > >>>> I am wondering how a LRU system would work with the skewed entries.   
   > >>>> I assume the entries would be shifting between the ways for LRU. So, if   
   > >>>> there is a different size page in one of the ways would it still work?   
   > >>>   
   > >>> One would use the not-recently-used variant of LRU.   
   > >>> One accumulated used bits on a per set basis.   
   > >>> When all used-bits have been set, one clears   
   > >>> all used bits.   
   > >>>   
   > >>> In practice it is easier to implement and almost as good.   
   > >>   
   > >> FWIW:   
   > >> I ended up not using an MRU or LRU for the TLB.   
   > >   
   > > My NRU is comprised of a single S-R flip-flop per PTE, with all of them   
   > > wired to an (high-fan-in) AND gate.   
   > >   
   > > When all NRU-bits are set, AND gate asserts, and all reset synchronously.   
   > >   
   > > When TLB takes a miss, Fint-First logic chooses a unary index for the new   
   > > PTE. Writing (or reading) a PTE sets its S-R FF.   
   > >   
   > > Simple and effective.   
   > >   
   >   
   >   
   > All pose a similar issue though in that they introduce behavior that is   
   > not visible to the miss handler, and thus the exact state of the TLB   
   > can't be modeled in software. Well, unless maybe there were some good   
   > way to estimate this in SW and with a LDTLB which encoded which way to   
   > load the TLBE into.   
   >   
   > LDTLB: Load register pair into TLB, push-back by 1.   
   > LDTLB{1/2/3/4}; Load into 1st/2nd/3rd/4th way of TLB.   
      
   MIPS (company not Stanford) had the TLE logic find the entry to be   
   written (find first), so SW just hashed a big table, checked match   
   and if so, inserted new PTE where HW said.   
      
   IF you are going to do a SW TLB reloader, this is the way.   
      
   > Then again, in this case, NRU state would be 1024 bits, and could   
   > potentially be made accessible via a special mechanism (say, an LDNRU   
   > instruction). Would still be too bulky in this case to route it through   
   > CPUID, and wouldn't want to get the MMIO mechanism involved.   
   >   
   >   
   > Well, maybe could special case-it:   
   > MMIO requests to the TLB need not make it all the way to the MMIO bus,   
   > the TLB could just see the request passing by and be like "hey, that   
   > one's for me" and handle it directly as it passes by (a similar   
   > mechanism having already used for LDTLB, which sort of reach the TLB   
   > like specially malformed memory stores).   
      
   In wide super-scalar machines all of the memory 'paths' need this logic,   
   and sometimes ordering of MMI/O becomes a "little tricky".   
   ---------------------   
   > > MMI/O is potentially SLOW.   
   > >   
   >   
   > Beyond this... If we are already paying the cost of a exception handler   
   > for TLB Miss, then some extra MMIO isn't likely a huge issue.   
      
   MIPS TLB reloader was 17 cycles (when instructions were in cache)   
   I have witnessed MMI/O that was slower than 1µs !! Also the control   
   register bus in Athlon/Opteron was SLOW, too:: so you can't always   
   use control registers in lieu of MMI/O ...   
   -----------------   
   > >   
   > > True LRU is BigO( n^3 ) in logic gates; whereas NRU is BigO( k ).   
   > >   
   >   
   > oK.   
   >   
   > I guess I can note that my use of LRU/MRU was most often based on STF,   
   > in a naive case:   
   > Access 3: Swap 3 and 2   
   > Access 2: Swap 2 and 1   
   > Access 1: Swap 1 and 0   
   > Access 0: No Change   
      
   SWAP moves the whole size of PTE !! whereas NRU sets 1 bit.   
   Would never be allowed in a low-power implementation.   
   ----------------   
   > Say, for example, in my current 3D engine for things like height-map:   
   > Fill a lookup table with random numbers generated from a PRNG (256 entries);   
   > Hash the X/Y or X/Y/Z coords ("(((X*P1+Y*P2+Z*P3)*P4)>>SH)&255");   
   > Select an index from the lookup table, and repeat with adjusted indices   
   > for adjacent entries;   
   > Interpolate between these.   
      
   One of the cool things about HW hashing is bit reversal of ½ the fields in   
   the hash--goes a long way to 'whiten' the noise signature.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|