Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 130,676 of 131,241    |
|    BGB to Stefan Monnier    |
|    Re: Impact of code size    |
|    30 Dec 25 01:55:21    |
      From: cr88192@gmail.com              On 12/29/2025 2:14 PM, Stefan Monnier wrote:       >>> I wonder if there have been other studies to explore other impacts       >>> such as run time, or cache miss rate.       >> The difficulty there is standardising the input data, and normalising       >> processor performance, memory bandwidth and latency, etc.       >       > I was thinking of those "compressed" variants of ISAs, such as Thumb,       > Thumb2, MIPS16e, microMIPS, or the "C" option of RISC-V, where you can       > compare with/without on the very same machine since all the half-size       > instructions are also available in full-size.       >              Yep.                     >> Code segment size is much easier to measure.       >       > Yes, but!       >              Code-size conflates several desirable properties:        Space saving, reducing instruction counts, etc.              But, in so doing, loses distinctiveness:       Is the binary smaller due to a smaller number of bigger instructions, or       a larger number of smaller instructions?...              Smaller binary can be good, but a larger number of smaller instructions       is less so.                     Like, say:        Doom compiled to XG3 vs Doom compiled to SH-4...        The size of ".text" isn't that much different.        But, the SH-4 version has around 230% as many instructions.        So, would perform significantly worse.                     Actually, kinda of funny the path I took (nearly a decade thus far):        Started out with 16-bit instructions and 16 registers (and SH-4);        Then went to 16/32 (BJX1-32);        Then went 64-bit (at which, first attempt was a horrible mess);        Then "simplified" it (BJX1-64C)        (clean-ups and dropping stuff to free encoding space);        Then creating a minimalist version of the 64-bit ISA (B64V)        back to fixed-length 16-bit instructions;        Then made a 32-bit version (B32V),        and reworked the encoding (BTSR1);        Then made it 64-bit again (BJX2);        Re-added 32-bit instructions, but different this time;        16-bit encodings, R0..R15, 32-bit R0..R31        Then added 48 bit encodings;        Added explicit parallelism (WEX) and Predication;        Was encoded by 2 bits in each instruction;        Then ended up making 32-bit encodings primary, rather than 16-bit;        Then added jumbo prefixes, dropping the 48 bit encodings;        Then added SIMD;        Then started expanding to 64 GPRs (XGPR);        Then added RISC-V decoder support;        Created an ISA variant that goes fully 64 GPR (XG2),        at expense of 16-bit ops;        In basic cases, 32-bit encodings are common with its predecessor.        But, some bit-twiddly dog-chew.        Starts to note RISC-V and GCC are not a "silver bullet"        Seemingly RV+GCC doing well being Dhrystone;        Ported a few RV features to my ISA, to solidly regain perf lead.        But, RV still has some merits, even if not the best perf.        Makes my compiler target RISC-V as well;        Experiments with some extensions, improving perf.        Tried gluing a lot of features from my ISA onto RISC-V;        Excluding predication, no real way to make this work as-is.        Makes a new ISA variant that glues both ISAs together (XG3),        in the same encoding space, sacrificing WEX.        Predication can still be encoded, but demoted to optional.        Depends on arch state that doesn't formally exist in RV.                     Then, say, XG3 is pretty much unrecognizable if compared with SH-4.              Say:        SH-4:        16-bit instructions, 16 registers, 32-bit word size;        XG3:        32/64/96 bit instructions;        64 registers;        64-bit word size.              Register Space:        SH-4:        R0..R3: Scratch        R4..R7: Arg1..Arg4 / Scratch        R8..R14: Callee Save        R15: SP        Prototypical instruction: zzzz-nnnn-mmmm-zzzz        XG2:        R0 / R1: Dedicated Stomp Regs        R2 / R3: Scratch        R4..R7: Arg1..Arg4 / Scratch        R8..R14: Callee Save        R15: SP        R16..R19: Scratch        R20..R23: Arg5..Arg8 / Scratch        R24..R31: Callee Save        R32..R35: Scratch        R36..R39: Arg9..Arg12 / Scratch        R40..R47: Callee Save        R48..R51: Scratch        R52..R55: Arg13..Arg16 / Scratch        R56..R63: Callee Save        Prototypical instruction: NMOP-xwxx-nnnn-mmmm,yyyy-qnmo-oooo-zzzz        XG3 (and RV):        R0: ZR / Zero        R1: LR / RA        R2: SP        R3: GBR / GP        R4: TP        R5..R7: De-Facto Stomp        R8/R9: Callee Save        R10..R17: Arg1..Arg8 / Scratch        R18..R27: Callee Save        R28..R31: Scratch        R32..R63 == F0..F31        F0..F3: Stomp or Scratch (Stomp for RV)        F4..F7: Scratch or Callee Save (ABI)        F8/F9: Callee Save        F10..F17: Scratch or Arg9..Arg16 (ABI)        F18..F27: Callee Save        F28..F31: Scratch        Prototypical instruction:        XG3: zzzzoooooommmmmmyyyynnnnnnqxxx10        RV : zzzzzzzooooommmmmyyynnnnnxxxxx11                     The stomp regs are functionally scratch registers, but may not be used       by the main part of the compiler to hold live values, as they are       reserved for the assembler stage to be able to stomp them without       warning when synthesizing pseudo-instructions.              In the transition from SH-4 to what became BJX2, the functionality of       MACL/MACH and PTEL/PTEH and similar was all folded into R0 and R1, which       were given the names DLR and DHR.              Also renamed PR to LR (basically the same as RV's RA).       Functionally, PR/RA/LR are all treated as aliases for the same register.       I used LR as pretty much everything calls it a "Link Register" so no       obvious reason IMO to not call it LR.                     Despite looking very different, XG3 instructions are mostly the same as       XG2 instructions but with a lot of the bits moved around and some other       fairly modest tweaks to the decoding rules (mostly to make encoding       rules appear consistent across instruction types). The new layout was       also made to look more cohesive with RV's layout, even if the fields are       in different places and different sizes.              Can note that XG2 and XG3 have fewer opcode bits than RISC-V, but       seemingly I didn't burn through encoding space at quite the same rate.              Though, unlike RISC-V, I also have a big pile of 2R instructions (which       use comparably less encoding space).              ...                     >       > Stefan              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca