home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 130,050 of 131,241   
   BGB to Robert Finch   
   Re: Tonights Tradeoff (1/3)   
   29 Oct 25 04:29:15   
   
   From: cr88192@gmail.com   
      
   On 10/28/2025 10:52 PM, Robert Finch wrote:   
   > Started working on yet another CPU – Qupls4. Fixed 40-bit instructions,   
   > 64 GPRs. GPRs may be used in pairs for 128-bit ops. Registers are named   
   > as if there were 32 GPRs, A0 (arg 0 register is r1) and A0H (arg 0 high   
   > is r33). Sameo for other registers. GPRs may contain either integer or   
   > floating-point values.   
   >   
      
   OK.   
      
   I mostly stuck with 32-bit encodings, but 40 could maybe allow more   
   encoding space, but the drawback of being non-power-of-2.   
      
   But, yeah, occasionally dealing with 128-bit data is a major case for 64   
   GPRs and paired-registers registers.   
      
      
   Well, that and when co-existing with RV64G, it gives somewhere to put   
   the FPRs. But, in turn this was initially motivated by me failing to   
   figure out how to get GCC configured to target Zfinx/Zdinx.   
      
      
   Had ended up going with the Even/Odd pairing scheme as it is less wonky   
   IMO to deal with R5:R4 than R36:R4.   
      
      
   > Going with a bit result vector in any GPR for compares, then a branch on   
   > bit-set/clear for conditional branches. Might also include branch true /   
   > false.   
   >   
      
   BT/BF works well. I otherwise also ended up using RISC-V style branches,   
   which I originally disliked due to higher implementation cost, but they   
   do technically allow for higher performance than just BT/BF or   
   Branch-Compare-with-Zero in 2-R cases.   
      
   So, it becomes harder to complain about a feature that does technically   
   help with performance.   
      
      
   > Using operand routing for immediate constants and an operation size for   
   > the instruction. Constants and operation size may be specified   
   > independently. With 40-bit instruction words, constants may be 10,50,90   
   > or 130 bits.   
   >   
      
   Hmm...   
      
   My case: 10/33/64.   
   No direct 128-bit constant, but can use two 64-bit constants whenever   
   128 bits is needed.   
      
      
      
   Otherwise, goings on in my land:   
   ISA development is slow, and had mostly turned into bug hunting;   
   There are some unresolved bugs, but I haven't been able to fully hunt   
   them down. A lot was in relation to RISC-V's C extension, but at least   
   it seems like at this point the C extension is likely fully working.   
      
   Haven't been many features that can usefully increase general-case   
   performance. So, it is starting to seem like XG2 and XG3 may be fairly   
   stable at this point.   
      
   The longer term future is uncertain.   
      
      
   My ISA's can beat RISC-V in terms of code-density and performance, but   
   when when RISC-V is extended with similar features, it is harder to make   
   a case that it is "enough".   
      
   Doesn't seem like (within the ISA) there are many obvious ways left to   
   grab large general-case performance gains over what I have done already.   
      
   Some code benefits from lots of GPRs, but harder to make the case that   
   it reflects the general case.   
      
      
      
   Recently got a new very-cheap laptop (a Dell Latitude 7490, for around   
   $240), made some curious observations:   
   It seems to slightly outperform my main PC in single-threaded performance;   
   Its RAM timings don't seem to match the expected values.   
      
   My main PC still wins at multi-threaded performance, and has the   
   advantage of 7x more RAM.   
      
   Had noted in Cinebench that my main PC is actually performing a little   
   slower than is typical for the 2700X, but then again, it is effectively   
   a 2700X running with DDR4-2133 rather than DDR4-2933, but partly this   
   was a case of the RAM I have was unstable if run all that fast (and in   
   this case; more RAM but slightly slower seemed preferable to less RAM   
   but slightly faster, or running it slightly faster but having the   
   computer be crash-prone).   
      
   They sold the ran with its on-the-box speed being the XMP2 settings   
   rather than the baseline settings, but the RAM in question didn't run   
   reliably at the XMP or XMP2 settings (and wasn't inclined to spend more;   
   more so when there was already the annoyance that my MOBO chipset   
   apparently doesn't deal with a full 128GB, but can tolerate 112GB, but   
   maybe not an ideal setup for perf).   
      
   So, yeah, it seems that I have a setup where the 2700X is getting worse   
   single-threaded performance than the i7 8650U in the laptop.   
      
   Apparently, going by Cinebench scores, my PC's single threaded   
   performance is mostly hanging out with a bunch of Xeons (getting a score   
   in R23 of around 700 vs 950).   
      
   Well, could be addressed, in theory, but would need some RAM that   
   actually runs reliably at 2933 or 3200 MT/s and is also cheap...   
      
      
   In both cases, they are CPUs originally released in 2018.   
      
   Has noted, in a few tests:   
      LZ4 benchmark (same file):   
        Main PC: 3.3 GB/s   
        Laptop: 3.9 GB/s   
      memcpy (single threaded):   
        Main PC: 3.8 GB/s   
        Laptop : 5.6 GB/s   
      memcpy (all threads):   
        Main PC: ~ 15 GB/s   
        Laptop : ~ 24 GB/s   
          ( Like, what; thing only has 1 stick of RAM... *1 )   
      
   *1: Also, how is a laptop with 1 stick of RAM matching a dual-socket   
   Xeon E5410 with like 8 sticks of RAM...   
      
   or, maybe it was just weak that my main PC was failing to beat the Xeon   
   at this?... My main PC does at least beat the Xeon at single-threaded   
   performance (was less true of my older Piledriver based PC).   
      
      
   Granted, then again, I am using (almost) the cheapest MOBO I could find   
   at the time (that had an OK number of RAM slots and SATA connectors).   
   Can't quite identify the MOBO or chipset as I lost the box (and not   
   clearly labeled on the MOBO itself); except that it is a   
   something-or-another ASUS board.   
      
   Like, at the time, IIRC:   
      Went on Newggg;   
      Pick mostly the cheapest parts on the site;   
        Say, a Zen+ CPU being a lot cheaper than Zen 2,   
          or pretty much anything from Intel.   
      ...   
      
      
   Did get a slightly fancy/beefy case, but partly this was because I was   
   annoyed with the late-90s-era beige tower case I had been using. Which I   
   had ended up hot gluing a bunch of extra PC fans into the thing in an   
   attempt to keep airflow good enough so that it didn't melt. And   
   under-clocking the CPU so that it could run reliably.   
      
   Like, 4GHz Piledriver ran too hot and was unreliable, but was far more   
   stable at 3.4 GHz. Was technically faster than a Phenom II underclocked   
   to 2.8 GHz (for similar reasons).   
      
   Where, at least the Zen+ doesn't overheat at stock settings (but, they   
   also supplied the thing with a comparably much bigger stock CPU cooler).   
      
   The case I got is slightly more traditional, with 5.25" bays and similar   
   and mostly sheet-steel construction, Vs the "new" trend of mostly   
   glass-covered-box PC cases. Sadly, it seems like companies have mostly   
   stopped selling the traditional sheet-steel PC cases with open 5.25"   
   bays. Like, where exactly is someone supposed to put their DVD-RW drive,   
   or hot-swap HDD trays ?...   
      
   Well, in the past we also had floppy drives, but the MOBO's removed the   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca