... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
alt.os.development
Operating system development chatter
4,255 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 4,066 of 4,255
BGB to James Harris
Re: This newsgroup. (1/2)
14 Dec 23 22:38:28
   From: cr88192@gmail.com   
      
   On 12/14/2023 2:01 PM, James Harris wrote:   
   > On 12/12/2023 23:56, Scott Lurndal wrote:   
   >> James Harris  writes:   
   >>> On 23/03/2023 19:49, Dan Cross wrote:   
   >>>> In article ,   
   >>>> Scott Lurndal  wrote:   
   >>>>> cross@spitfire.i.gajendra.net (Dan Cross) writes:   
   >>>   
   >>> ...   
   >>>   
   >>>>>>    It was never clear to me   
   >>>>>> how a hypervisor could, in general, know the format of the guest   
   >>>>>> page tables.  I know the Disco folks had to make some changes to   
   >>>>>> Irix to get it to work.   
   >>>>>   
   >>>>> When I was working on IRIX, I was not fond of either the software   
   >>>>> managed TLB, coloring or the Kseg stuff; the MIPS project I worked   
   >>>>> on was called   
   >>>>> Teak and was a distributed version of Irix (eventually cancelled)   
   >>>>> for networks of R10k boxes.   
   >>>>   
   >>>> I get it from a hardware perspective: fewer transistors with a   
   >>>> software-managed TLB, but man...so many drawbacks.   
   >>>   
   >>> Handling a software-managed TLB may be more work, in a sense, but it   
   >>> gives an OS developer more control, more feedback, more freedom, and   
   >>> perhaps better opportunities for performance gains - as long as the TLB   
   >>> is large enough.   
   >>>   
   >>> Having the hardware carry out a walk of page tables (the only option if   
   >>> the TLB can is updated by hardware) has long seemed to me like a bad   
   >>> idea, and it doesn't scale very well as addresses get wider.   
   >>   
   >> Having worked extensively with both models (SW: MIPS, HW: pretty much   
   >> every other single mass-produced microprocessor), there is, hands down,   
   >> no benefit to software table walks.   Zero.  Zilch.  Don't even bother.   
   >   
   > On the contrary!   
   >   
   > Brendan (BGB) has already covered hardware issues and I respect his   
   > views and experience on that. The /software/ problem with forward page   
   > tables is that while they are OK (but still unnecessary) for small   
   > numbers of processes and concomitant address spaces they don't work so   
   > well for larger systems.   
   >   
      
   It depends...   
      
   But, thanks anyways for the confidence here.   
      
      
   > Forward page tables take up a significant amount of RAM - as a rule of   
   > thumb let's say up to 1/1000th of the address space, e.g. up to 4M for a   
   > 4G address space on x32. While that's no problem for a small number of   
   > processes and the impact can be reduced in many cases it's still   
   > unavoidably true that each time you add a new process you have to add a   
   > new set of page tables. Have 5,000 processes and you need 5,000 sets of   
   > forward page tables. The whole idea is, to use a technical term,   
   > bonkers. As I said, the approach does not scale.   
   >   
   > By contrast, with reverse page tables you need just ONE table per   
   > machine, no matter how many processes you want to support. Instead of   
   > taking 1/1000th of the address space for every process one would use,   
   > say, 1/500th of the RAM for all processes combined.   
   >   
      
   Something like an inverted page-table is more sensible to be used more   
   as a cache for some other page-table-like structure, not really as a   
   full replacement for it.   
      
   One limiting factor is associativity. Say, for example, if the inverted   
   page-table is 8-way associative, then one may still have frequent   
   "misses" if more than 8 commonly used pages happen to land on the same   
   location in the table (depending on address space layouts, this can   
   potentially still be a serious issue for performance).   
      
      
      
   One may or may not have an inverted page table in hardware, depending   
   mostly on cost tradeoffs:   
   Like with a page-table walker, it does have the drawback of needing to   
   have a mechanism to access memory;   
   But, it can reduce performance overheads by reducing the number of TLB   
   misses that need to be handled by software.   
      
      
      
   But, as for page-table alternatives, I had "sorta OK" results with   
   B-Trees, which can be more space-efficient, however, they do have the   
   drawback of being significantly slower.   
      
   The B-Tree variants mostly make sense for "very large" (and very sparse)   
   address spaces, whereas for an address space that is 48 bits or less,   
   page tables work well enough.   
      
      
   A semi-practical alternative is hybrid B-Trees, where the last level is   
   a page-table, but all of the upper levels are B-Tree.   
      
   These can get much of the same advantages for large sparse address   
   spaces, with a less of a performance impact than a pure B-Tree.   
      
   It is also possible to use a hash-table to cache lookups from the B-Tree   
   part of the table.   
      
      
   For a 48-bit address space with 16K pages, I am admittedly mostly just   
   using 3 level conventional page-tables.   
      
   I had considered B-Trees mostly for an extended 96-bit address space;   
   mostly because the upper 5 levels of page-table would be "extremely   
   sparse" (and a hybrid B-Tree could generally collapse everything back   
   down to 3 levels).   
      
      
      
   > I acknowledge that each process will need forward mappings but can   
   > arrange to reserve them in runs of (first, last) addresses for much   
   > greater space efficiency than you'd get with forward page tables.   
   >   
      
   This can happen in some scenarios, but sadly not really in cases where   
   one is using a pagefile (where page numbers may end up being assigned   
   "basically at random").   
      
      
   >>   
   >> Hardware translation table walks scale rather well, and in modern   
   >> incarnations (e.g. ARMv8) are very flexible, supporting multiple   
   >> fundamental unit of translation sizes (e.g. 4k, 16k, 64k) and   
   >> higher level "huge pages".   Add in the second level of walks   
   >> required for hardware VM guest page table walkers[*] and the software   
   >> walker becomes fragile and slow.      The hardware walkers have   
   >> things like content addressible memory and intermediate translation   
   >> walk caches that software cannot do as effectively or efficiently.   
   >   
   > If one handles address-space allocation and mapping to physical   
   > addresses in software then one is not limited to 4k, 16k, 4M etc ranges.   
   >   
      
   Technically, yes.   
      
   You can represent a logical page-size that is any desired multiple of   
   the page-size used by the TLB.   
      
   The main tradeoff then is mostly the relation between page-size and TLB   
   capacity.   
      
   Smaller page size: More TLB misses; Bigger pages: fewer TLB misses.   
      
   In my testing, 16K seemed like the local optimum.   
      
      
   For a set-associative TLB design, generally only one page size can be   
   used at a time. It is possible to use multiple page-sizes with a   
   fully-associative TLB, but getting to any non-trivial size with a   
   fully-associative design becomes rather expensive (so, 4 or 8 way   
   associativity seems like a reasonable limit).   
      
      
   At least on Xilinx FPGAs, there are magic numbers for array sizes:   
      16 entries: OK (LUTRAM)   
        FPGA uses half of a 32-entry LUTRAM.   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]