... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,807 of 131,241
BGB to Stefan Monnier
Re: SASOS and virtually tagged caches
03 Oct 25 17:42:19
   From: cr88192@gmail.com   
      
   On 10/3/2025 10:26 AM, Stefan Monnier wrote:   
   >>> | - virtually tagged caches   
   >>> |   You can't really claim to be worst-of-the-worst without virtually   
   >>> |tagged caches.   
   >>> |  Tears of joy as you debug cache alias issues and of flushing caches   
   >>> |on context switches.   
   >> That is only true if one insists on OS with Multiple Address Spaces.   
   >> Virtually tagged caches are fine for Single Address Space (SAS) OS.   
   >   
   > AFAIK, the main problem with SASOS is "backward compatibility", most   
   > importantly with `fork`.  The Mill people proposed a possible solution,   
   > which seemed workable, but it's far from clear to me whether it would   
   > work well enough if you want to port, say, Debian to such   
   > an architecture.   
   >   
      
   You can... just sort of not support full "fork()"; or support it in a   
   way similar to how it works on ucLinux and Cygwin. Namely, you can use   
   it, but trying to use it for anything more than a fork immediately   
   followed by an "exec*" call or similar is probably going to break something.   
      
   Well, or anything that depends on "fork()" isn't going to work; and the   
   preferable way to spawn new process instances is something along the   
   lines of a "CreateProcessEx()" style mechanism.   
      
      
      
      
   As can be noted, I had designed my ABIs with the assumption of a single   
   address space.   
      
   Generally, it ended up as 48 bit as, even within the limits of an FPGA   
   with only 128MB of actual RAM or so, a 32-bit VAS can get a bit cramped   
   (where, 32-bits is only really enough for a single-program in an address   
   space, if that).   
      
      
   My "break glass" feature for 48-bits being insufficient for a single   
   address space was expanding the VAS to 96 bits, though even this was a   
   bit wonk:   
   Low 32-bits: Real address bits;   
   Next 24 bits: Just sorta mash all the HOBs together and hope it doesn't   
   break.   
      
   Where, say, extending the L1 cache tags by 8 bits is a lot cheaper than   
   extending them by 48 bits, and offers a sufficiently low probability of   
   aliasing.   
      
      
   So, in the 96-bit mode:   
      0000_00000000-0000_00000000..0000_00000000-7FFF_FFFFFFFF:   
        Preserved exactly if no higher addresses used.   
      Anything else: YMMV.   
      
   There is a non-zero risk of random 4GB regions aliasing based on the   
   whims of the XOR, as actually storing full 96-bit addresses is steep.   
   The page-tables and TLB could support full-width 96-bit addresses, so   
   the main problem area would be trying to use two addresses at the same   
   time where they would map to the same location in the L1 cache.   
      
   However, if one assumes a scenario where each program is confined to a   
   slice of the bigger 96-bit space, then the XOR's all even out and the   
   address space is consistent (the risk mostly appearing when using   
   addresses not within the same 48-bit "quadrant").   
      
      
   Theoretically, the OS's ASLR could keep track of this and not assign   
   address ranges that would alias with previously used address ranges (via   
   a lookup table).   
      
   Kinda similar crap to the "PE loader may not load a PE to an address   
   that crosses a 4GB boundary" because it adds cost to have   
   direct-branches and PC increment need to deal with more than 4GB.   
      Well, sorta:   
        PC increment still has a 4GB window;   
      Branches are either 16MB window (via branch predictor);   
        Or, +/- 8GB, via normal address calc.   
        Branch predictor detecting carry-out and not handling the branch.   
        Was 4GB originally, but the above trick allowed being cheaper here.   
        However, crossing a 16MB barrier has a performance penalty.   
          Statistically low probability of ".text" crossing such a barrier.   
      
      
   Arguably, all still kinda crap though...   
      
      
   For now, 48-bits is plenty for my uses.   
      
   I considered possible options 64-bit VAS support (within the 96-bit   
   mode), but annoyingly, if done in an affordable way, would likely not   
   allow program code outside the low 48 bits, or arrays crossing a 48-bit   
   boundary (or, still slightly jank).   
      
      
      
   Though, IMHO, still better than what MIPS did, IIRC:   
      PC1[63:28] = PC0[63:28]   
      PC1[27: 2] = JAL_Addr[25:0]   
      PC1[ 1: 0] = 0   
      
   Or, say, you have a 256MB barrier that may not be crossed, and the   
   loader would need to rebase within said 256 MB.   
      
   Information is inconsistent for conditional branches, where some   
   information implies it is simply adding the displacement (scaled by 4),   
   and other info implies:   
      Copy high bits unchanged;   
      Add low-order bits;   
      Address may wrap if it crosses some ill-defined address barrier.   
      
   They seemingly missed an opportunity to go cheaper for Bcc here, say:   
      PC1[63:20] = PC0[63:20]   
      PC1[19:14] = PC0[19:14] + SExt(Bcc_Addr[15:12])   
      PC1[13: 2] = Bcc_Addr[11:0]   
      PC1[ 1: 0] = 0   
   Then, say, one only needs to do an 6-bit addition for the conditional   
   branch instruction.   
      
   Trying to rebase a program at load time being "there be dragons here"   
   territory.   
      
      
   ...   
      
      
      
   >   
   >          Stefan   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]