... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
alt.os.development
Operating system development chatter
4,255 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,825 of 4,255
BGB to Scott Lurndal
Re: x86-S (1/2)
26 May 23 04:44:09
   From: cr88192@gmail.com   
      
   On 5/25/2023 8:24 AM, Scott Lurndal wrote:   
   > BGB  writes:   
   >> On 5/24/2023 1:31 PM, Scott Lurndal wrote:   
   >>> BGB  writes:   
   >>>> On 5/24/2023 6:33 AM, Dan Cross wrote:   
   >>>>> In article , BGB     
   wrote:   
   >>>   
   >>>>>   
   >>>>>> But, in any case, SuperH (along with PA-RISC, MIPS, SPARC, etc) got   
   >>>>>> along reasonably well with software-managed TLB.   
   >>>   
   >>> Having much experience with MIPS (both at SGI and Cavium) I would   
   >>> dispute that characterization.   
   >>>   
   >>   
   >> Possibly.   
   >>   
   >> If it were that bad though, why would IA-64 have gone that route?...   
   >> Or, say, the people still using the Power ISA?...   
   >   
   > IA-64 is obsolete.   And it was designed almost three decades ago.   
   >   
   > Times change.   
   >   
      
   It is still newer than x86 and ARM...   
      
      
   But, yeah, IA-64's high-point (in terms of popularity) was back when I   
   was in high-school, which was sort of a while ago now.   
      
   The world ended up getting x86-64 instead.   
      
      
   However, the fundamentals of computing aren't really much different now   
   than they were 20 or 30 years ago.   
      
      
   The OS's we use now aren't *that* much different than I was using in   
   high-school, only now the CPUs are a little faster and we have a lot   
   more RAM...   
      
   At the time, IIRC, I was mostly dual-booting Win2K and Mandrake Linux;   
   WinXP had come out, but I didn't switch over until later (then got an   
   Athlon64 PC and switched to WinXP X64... and had "fun times with system   
   stability" for the next few years...).   
      
      
   Most programs are at least "sane" regarding memory use, apart from   
   Firefox always wanting to grow to consume all available RAM if given   
   enough time.   
      
      
      
   Ironically, if a HW page-walker were used, and the page-tables weren't   
   already in the L2 cache, one would possibly still be looking at around   
   400-600 clock cycles just for 3 memory accesses...   
      
   Meanwhile, the special scratch RAM for the ISRs (at 0000C000..0000DFFF)   
   has the property that it is in Block-RAM so it never has an L2 miss (can   
   slightly help speed up the "spill and restore all the GPRs" thing for   
   the ISRs).   
      
      
   Likely, a "better" option might be a giant RAM-backed TLB; since then   
   all the access is in adjacent cache lines (so could play better with the   
   interface between the L2 cache and DRAM).   
      
      
   >   
   >>   
   >> Doom is sort of notable in that it is a poor fit to the ISA design, but   
   >> reflects a lot of "generic" coding practices (along with Quake and ROTT).   
   >   
   > Does it?   
   >   
      
   Doom, Quake, and ROTT and similar seem to be good examples of "generic   
   programming practices".   
      
   Also a lot of small/tight loops that my CPU does crappy at...   
      
   Also a lot of L2 cache misses (a fair chunk of performance being lost   
   mostly to needing to wait for the L2 cache; Doom will otherwise run at   
   the 35 fps limiter if it were in a scenario where the L2 cache always   
   hits...).   
      
      
   On the current FPGA, I can use a 512K L2 cache, which at least   
   "partially" compensates for the relatively slow access to the external   
   RAM chip (has a 16-bit interface, running at 50MHz, with a 5-cycle   
   minimum CAS latency; and pulling off an "epic" ~ 18MB/sec for memcpy).   
      
   The XC7A200T isn't quite big enough for me to have a 1MB L2 cache though.   
      
      
      
      
   Bulky highly-unrolled loops tend to work better, but are not really   
   standard coding practice.   
      
      
      
   An interesting edge case is with some of my neural net tests, which can   
   turn into large blocks of hundreds of kB of straight-line SIMD code   
   (with no real looping whatsoever).   
      
   Can generally be speed competitive with an early 2000s laptop at this   
   task, though does hit the L1 I-Cache pretty hard...   
      
   This is likely more of a novelty than anything else though.   
      
   Otherwise, laptop has 30x the clock-speed and 10x the DRAM memcpy   
   bandwidth (but only around 3x for small L1 local copies).   
      
      
   >>> I suspect that parallelism is the answer to the purported end of   
   >>> Moore's law.   Note that the largest AMD supercomuputer now has over   
   >>> 8 million cores.   
   >>>   
   >>   
   >> There will be a limit to how many cores one can fit into a die with a   
   >> given power budget and a given micro-architecture.   
   >>   
   >> x86 and OoO in general would likely become unfavorable:   
   >>    x86 needs OoO to not perform like crap;   
   >>    OoO needs a lot of die-space and power.   
   >   
   > Does it?   
   >   
      
   Are you claiming that x86 on an in-order CPU wouldn't suck?...   
   They made the jump to OoO pretty much before everyone else (eg, "Pentium   
   Pro");   
   with the 486 and early Pentium as the last of the in-order chips (apart   
   from a short run with early versions of Atom).   
      
   Meanwhile, some other architectures (such as ARM) hold up much better   
   with in-order CPUs (such as the seemingly ubiquitous Cortex-A53 used in   
   most of the cell-phones I have had in recent years).   
      
      
   >   
   >>>>   
   >>>> Noting that the 96-bit space is far larger than the ASID space, and it   
   >>>> is unlikely that the guest will use all of it.   
   >>>   
   >>> You know what they say about assumptions.   
   >>>   
   >>   
   >> At least in the near-term, there is unlikely to be either enough RAM or   
   >> HDD space to make full use of such an address space.   
   >   
   > If you're making a microcontroller, then you don't need all the   
   > fancy features.   If you designing a general purpose processor,   
   > current processors are designed to support 52-bits VA and PA, and with   
   CXL-Memory,   
   > the need for full 64 bits is only a few years away.   
   >   
      
   52 or 64 bit VAs are still pretty massive overkill at present.   
      
   Like, most PCs have maybe 32GB or 64GB of RAM at present.   
      
   A cellphone has maybe 2GB or 4GB.   
      
      
   48-bit is plenty, we are nowhere close to the 256TB mark yet...   
      
   48-bit also leaves some bits free for dynamic type tags and similar,   
   which is a much more useful feature at present (also for encoding the   
   processor's instruction-set mode in function pointers and link-register   
   pointers, to allow for function pointers between instruction-set modes).   
      
   There was a considered feature to allow extending 64-bit pointers to 60   
   address bits (with only 4 bits for type-tag bits), but haven't done this   
   (not useful at present).   
      
      
   As noted, the full "extended address space" in BJX2 is 96-bit, but this   
   would be stupid levels of overkill for any normal application (and I had   
   started work on an ABI that used 128-bit pointers, but shelved it for   
   now for sake of it being massive overkill, and at present, not really   
   worth the effort needed to debug it).   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]