home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   alt.os.development      Operating system development chatter      4,255 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,820 of 4,255   
   BGB to Scott Lurndal   
   Re: x86-S   
   25 May 23 04:20:26   
   
   From: cr88192@gmail.com   
      
   On 5/24/2023 1:31 PM, Scott Lurndal wrote:   
   > BGB  writes:   
   >> On 5/24/2023 6:33 AM, Dan Cross wrote:   
   >>> In article , BGB   wrote:   
   >   
   >>>   
   >>>> But, in any case, SuperH (along with PA-RISC, MIPS, SPARC, etc) got   
   >>>> along reasonably well with software-managed TLB.   
   >   
   > Having much experience with MIPS (both at SGI and Cavium) I would   
   > dispute that characterization.   
   >   
      
   Possibly.   
      
   If it were that bad though, why would IA-64 have gone that route?...   
   Or, say, the people still using the Power ISA?...   
      
      
   >>>   
   >>> In a very different time, with very different demands on the   
   >>> architecture.   
   >>>   
   >>   
   >> Depends on what one wants.   
   >>   
   >>   
   >> I am mostly imagining an architecture for embedded-systems style   
   >> use-cases (but, more DSP-like than microcontroller-like).   
   >   
   > So, computer games like Doom are not very good benchmark choices.   
   >   
      
   Doom is sort of notable in that it is a poor fit to the ISA design, but   
   reflects a lot of "generic" coding practices (along with Quake and ROTT).   
      
      
   But, despite being a poor fit, in 320x200 mode, Doom can pull off around   
   20fps at 50MHz. Performance is a bit worse if trying to run it in a   
   640x400 or 800x600 GUI mode though (my attempt at a GUI not doing quite   
   so well in terms of performance).   
      
      
      
   If I only test the stuff that works well, this is only part of the picture:   
   I can run real-time software-rasterized OpenGL on the thing;   
   It is also "pretty formidable" at running neural nets expressed as SIMD   
   ops and similar;   
   ...   
      
   But, like Software Quake pulls off an epic 2-4 FPS...   
   As-is, it is faster to run a modified GLQuake port on the thing.   
      
      
   I am generally getting a Dhrystone score of around 75000, which is   
   seemingly on-par with a lot of "retro" stats (comparable to 90s era   
   PowerPC machines relative to clock speed).   
      
   But, scores get a bit suspect if comparing against scores generated by   
   GCC or Clang, which seem to give "unreasonably fast" Dhrystone numbers   
   by default.   
      
   Stuff seems "less bad" if I compare against Dhrystone built with MSVC   
   though.   
      
      
   Not ported any standardized floating-point benchmarks though.   
      
   There is a risk they would get caught up on the "atrociously bad" FDIV   
   performance though (there is, sort-of, an instruction for this, but it   
   is generally faster to do Newton-Raphson iteration in software).   
      
      
   Similar sort of issue for integer divide (hardware integer divide was   
   required for the RISC-V mode to support the 'M' extension's   
   instructions; but not particularly fast).   
      
   Originally I had assumed skipping out on integer divide (and only   
   providing MUL and similar in hardware).   
      
   Though, my C compiler will (by default) use a C runtime call in the case   
   of integer divide or similar (with the integer divide being handled in   
   software).   
      
   But, neither is quite bad enough to fall into "boat anchor" territory.   
      
      
   >>   
   >> Say, something that does real-time audio/video processing and can run   
   >> neural nets.   
   >   
   >> So, for example, the design is an in-order VLIW, since it seems like   
   >> optimizing for OoO will become less attractive once Moore's Law ends   
   >> (say, if one wants more performance in less die area and less watts,   
   >> rather than maximum performance but throwing lots of die area and watts   
   >> at it).   
   >   
   > I suspect that parallelism is the answer to the purported end of   
   > Moore's law.   Note that the largest AMD supercomuputer now has over   
   > 8 million cores.   
   >   
      
   There will be a limit to how many cores one can fit into a die with a   
   given power budget and a given micro-architecture.   
      
   x86 and OoO in general would likely become unfavorable:   
      x86 needs OoO to not perform like crap;   
      OoO needs a lot of die-space and power.   
      
      
   A simple RISC could fare a little better, but in-order superscalar is   
   fairly limited.   
      
   A VLIW can push a little closer to OoO performance, while still having   
   the "cheapness" of an in-order design. Cost being that one needs a more   
   complicated compiler, and that the compiler needs to be aware of the   
   pipeline width and behavior (and which combinations of features are   
   allowed on a given CPU core).   
      
      
   >   
   >>   
   >> As for virtual-address spaces having the same addresses:   
   >>    You can use ASIDs.   
   >   
   > Actually, you need both VMIDs (virutal machine ID) and ASIDs (address space   
   ID).   
   >   
   > All user-mode applications running under all virtual machines may be using   
   identical   
   > virtual addresses.  Which means you need to tag the TLB entries with both   
   VMID   
   > and ASID (now you're up to 32 bits of tag if both are 16-bits).   
   >   
   > And even 16-bits of ASID are insufficent on multiprocessor machines and   
   > the OS needs a mechanism to invalidate all ASIDs and assign new ones   
   > when unassigned processes are subsequently scheduled.   
   >   
      
   ASIDs could be locally assigned in many cases.   
      
   Adding a VMID probably shouldn't be needed if the VMs use an ASID   
   remapping table or similar.   
      
   As least on moderately sized systems, it is likely one would run out of   
   memory before they run out of ASIDs.   
      
      
   >>   
   >> Noting that the 96-bit space is far larger than the ASID space, and it   
   >> is unlikely that the guest will use all of it.   
   >   
   > You know what they say about assumptions.   
   >   
      
   At least in the near-term, there is unlikely to be either enough RAM or   
   HDD space to make full use of such an address space.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca