Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 130,050 of 131,241    |
|    BGB to Robert Finch    |
|    Re: Tonights Tradeoff (1/3)    |
|    29 Oct 25 04:29:15    |
      From: cr88192@gmail.com              On 10/28/2025 10:52 PM, Robert Finch wrote:       > Started working on yet another CPU – Qupls4. Fixed 40-bit instructions,       > 64 GPRs. GPRs may be used in pairs for 128-bit ops. Registers are named       > as if there were 32 GPRs, A0 (arg 0 register is r1) and A0H (arg 0 high       > is r33). Sameo for other registers. GPRs may contain either integer or       > floating-point values.       >              OK.              I mostly stuck with 32-bit encodings, but 40 could maybe allow more       encoding space, but the drawback of being non-power-of-2.              But, yeah, occasionally dealing with 128-bit data is a major case for 64       GPRs and paired-registers registers.                     Well, that and when co-existing with RV64G, it gives somewhere to put       the FPRs. But, in turn this was initially motivated by me failing to       figure out how to get GCC configured to target Zfinx/Zdinx.                     Had ended up going with the Even/Odd pairing scheme as it is less wonky       IMO to deal with R5:R4 than R36:R4.                     > Going with a bit result vector in any GPR for compares, then a branch on       > bit-set/clear for conditional branches. Might also include branch true /       > false.       >              BT/BF works well. I otherwise also ended up using RISC-V style branches,       which I originally disliked due to higher implementation cost, but they       do technically allow for higher performance than just BT/BF or       Branch-Compare-with-Zero in 2-R cases.              So, it becomes harder to complain about a feature that does technically       help with performance.                     > Using operand routing for immediate constants and an operation size for       > the instruction. Constants and operation size may be specified       > independently. With 40-bit instruction words, constants may be 10,50,90       > or 130 bits.       >              Hmm...              My case: 10/33/64.       No direct 128-bit constant, but can use two 64-bit constants whenever       128 bits is needed.                            Otherwise, goings on in my land:       ISA development is slow, and had mostly turned into bug hunting;       There are some unresolved bugs, but I haven't been able to fully hunt       them down. A lot was in relation to RISC-V's C extension, but at least       it seems like at this point the C extension is likely fully working.              Haven't been many features that can usefully increase general-case       performance. So, it is starting to seem like XG2 and XG3 may be fairly       stable at this point.              The longer term future is uncertain.                     My ISA's can beat RISC-V in terms of code-density and performance, but       when when RISC-V is extended with similar features, it is harder to make       a case that it is "enough".              Doesn't seem like (within the ISA) there are many obvious ways left to       grab large general-case performance gains over what I have done already.              Some code benefits from lots of GPRs, but harder to make the case that       it reflects the general case.                            Recently got a new very-cheap laptop (a Dell Latitude 7490, for around       $240), made some curious observations:       It seems to slightly outperform my main PC in single-threaded performance;       Its RAM timings don't seem to match the expected values.              My main PC still wins at multi-threaded performance, and has the       advantage of 7x more RAM.              Had noted in Cinebench that my main PC is actually performing a little       slower than is typical for the 2700X, but then again, it is effectively       a 2700X running with DDR4-2133 rather than DDR4-2933, but partly this       was a case of the RAM I have was unstable if run all that fast (and in       this case; more RAM but slightly slower seemed preferable to less RAM       but slightly faster, or running it slightly faster but having the       computer be crash-prone).              They sold the ran with its on-the-box speed being the XMP2 settings       rather than the baseline settings, but the RAM in question didn't run       reliably at the XMP or XMP2 settings (and wasn't inclined to spend more;       more so when there was already the annoyance that my MOBO chipset       apparently doesn't deal with a full 128GB, but can tolerate 112GB, but       maybe not an ideal setup for perf).              So, yeah, it seems that I have a setup where the 2700X is getting worse       single-threaded performance than the i7 8650U in the laptop.              Apparently, going by Cinebench scores, my PC's single threaded       performance is mostly hanging out with a bunch of Xeons (getting a score       in R23 of around 700 vs 950).              Well, could be addressed, in theory, but would need some RAM that       actually runs reliably at 2933 or 3200 MT/s and is also cheap...                     In both cases, they are CPUs originally released in 2018.              Has noted, in a few tests:        LZ4 benchmark (same file):        Main PC: 3.3 GB/s        Laptop: 3.9 GB/s        memcpy (single threaded):        Main PC: 3.8 GB/s        Laptop : 5.6 GB/s        memcpy (all threads):        Main PC: ~ 15 GB/s        Laptop : ~ 24 GB/s        ( Like, what; thing only has 1 stick of RAM... *1 )              *1: Also, how is a laptop with 1 stick of RAM matching a dual-socket       Xeon E5410 with like 8 sticks of RAM...              or, maybe it was just weak that my main PC was failing to beat the Xeon       at this?... My main PC does at least beat the Xeon at single-threaded       performance (was less true of my older Piledriver based PC).                     Granted, then again, I am using (almost) the cheapest MOBO I could find       at the time (that had an OK number of RAM slots and SATA connectors).       Can't quite identify the MOBO or chipset as I lost the box (and not       clearly labeled on the MOBO itself); except that it is a       something-or-another ASUS board.              Like, at the time, IIRC:        Went on Newggg;        Pick mostly the cheapest parts on the site;        Say, a Zen+ CPU being a lot cheaper than Zen 2,        or pretty much anything from Intel.        ...                     Did get a slightly fancy/beefy case, but partly this was because I was       annoyed with the late-90s-era beige tower case I had been using. Which I       had ended up hot gluing a bunch of extra PC fans into the thing in an       attempt to keep airflow good enough so that it didn't melt. And       under-clocking the CPU so that it could run reliably.              Like, 4GHz Piledriver ran too hot and was unreliable, but was far more       stable at 3.4 GHz. Was technically faster than a Phenom II underclocked       to 2.8 GHz (for similar reasons).              Where, at least the Zen+ doesn't overheat at stock settings (but, they       also supplied the thing with a comparably much bigger stock CPU cooler).              The case I got is slightly more traditional, with 5.25" bays and similar       and mostly sheet-steel construction, Vs the "new" trend of mostly       glass-covered-box PC cases. Sadly, it seems like companies have mostly       stopped selling the traditional sheet-steel PC cases with open 5.25"       bays. Like, where exactly is someone supposed to put their DVD-RW drive,       or hot-swap HDD trays ?...              Well, in the past we also had floppy drives, but the MOBO's removed the              [continued in next message]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca