Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 129,807 of 131,241    |
|    BGB to Stefan Monnier    |
|    Re: SASOS and virtually tagged caches    |
|    03 Oct 25 17:42:19    |
      From: cr88192@gmail.com              On 10/3/2025 10:26 AM, Stefan Monnier wrote:       >>> | - virtually tagged caches       >>> | You can't really claim to be worst-of-the-worst without virtually       >>> |tagged caches.       >>> | Tears of joy as you debug cache alias issues and of flushing caches       >>> |on context switches.       >> That is only true if one insists on OS with Multiple Address Spaces.       >> Virtually tagged caches are fine for Single Address Space (SAS) OS.       >       > AFAIK, the main problem with SASOS is "backward compatibility", most       > importantly with `fork`. The Mill people proposed a possible solution,       > which seemed workable, but it's far from clear to me whether it would       > work well enough if you want to port, say, Debian to such       > an architecture.       >              You can... just sort of not support full "fork()"; or support it in a       way similar to how it works on ucLinux and Cygwin. Namely, you can use       it, but trying to use it for anything more than a fork immediately       followed by an "exec*" call or similar is probably going to break something.              Well, or anything that depends on "fork()" isn't going to work; and the       preferable way to spawn new process instances is something along the       lines of a "CreateProcessEx()" style mechanism.                                   As can be noted, I had designed my ABIs with the assumption of a single       address space.              Generally, it ended up as 48 bit as, even within the limits of an FPGA       with only 128MB of actual RAM or so, a 32-bit VAS can get a bit cramped       (where, 32-bits is only really enough for a single-program in an address       space, if that).                     My "break glass" feature for 48-bits being insufficient for a single       address space was expanding the VAS to 96 bits, though even this was a       bit wonk:       Low 32-bits: Real address bits;       Next 24 bits: Just sorta mash all the HOBs together and hope it doesn't       break.              Where, say, extending the L1 cache tags by 8 bits is a lot cheaper than       extending them by 48 bits, and offers a sufficiently low probability of       aliasing.                     So, in the 96-bit mode:        0000_00000000-0000_00000000..0000_00000000-7FFF_FFFFFFFF:        Preserved exactly if no higher addresses used.        Anything else: YMMV.              There is a non-zero risk of random 4GB regions aliasing based on the       whims of the XOR, as actually storing full 96-bit addresses is steep.       The page-tables and TLB could support full-width 96-bit addresses, so       the main problem area would be trying to use two addresses at the same       time where they would map to the same location in the L1 cache.              However, if one assumes a scenario where each program is confined to a       slice of the bigger 96-bit space, then the XOR's all even out and the       address space is consistent (the risk mostly appearing when using       addresses not within the same 48-bit "quadrant").                     Theoretically, the OS's ASLR could keep track of this and not assign       address ranges that would alias with previously used address ranges (via       a lookup table).              Kinda similar crap to the "PE loader may not load a PE to an address       that crosses a 4GB boundary" because it adds cost to have       direct-branches and PC increment need to deal with more than 4GB.        Well, sorta:        PC increment still has a 4GB window;        Branches are either 16MB window (via branch predictor);        Or, +/- 8GB, via normal address calc.        Branch predictor detecting carry-out and not handling the branch.        Was 4GB originally, but the above trick allowed being cheaper here.        However, crossing a 16MB barrier has a performance penalty.        Statistically low probability of ".text" crossing such a barrier.                     Arguably, all still kinda crap though...                     For now, 48-bits is plenty for my uses.              I considered possible options 64-bit VAS support (within the 96-bit       mode), but annoyingly, if done in an affordable way, would likely not       allow program code outside the low 48 bits, or arrays crossing a 48-bit       boundary (or, still slightly jank).                            Though, IMHO, still better than what MIPS did, IIRC:        PC1[63:28] = PC0[63:28]        PC1[27: 2] = JAL_Addr[25:0]        PC1[ 1: 0] = 0              Or, say, you have a 256MB barrier that may not be crossed, and the       loader would need to rebase within said 256 MB.              Information is inconsistent for conditional branches, where some       information implies it is simply adding the displacement (scaled by 4),       and other info implies:        Copy high bits unchanged;        Add low-order bits;        Address may wrap if it crosses some ill-defined address barrier.              They seemingly missed an opportunity to go cheaper for Bcc here, say:        PC1[63:20] = PC0[63:20]        PC1[19:14] = PC0[19:14] + SExt(Bcc_Addr[15:12])        PC1[13: 2] = Bcc_Addr[11:0]        PC1[ 1: 0] = 0       Then, say, one only needs to do an 6-bit addition for the conditional       branch instruction.              Trying to rebase a program at load time being "there be dragons here"       territory.                     ...                            >       > Stefan              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca