... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,676 of 131,241
BGB to Stefan Monnier
Re: Impact of code size
30 Dec 25 01:55:21
   From: cr88192@gmail.com   
      
   On 12/29/2025 2:14 PM, Stefan Monnier wrote:   
   >>> I wonder if there have been other studies to explore other impacts   
   >>> such as run time, or cache miss rate.   
   >> The difficulty there is standardising the input data, and normalising   
   >> processor performance, memory bandwidth and latency, etc.   
   >   
   > I was thinking of those "compressed" variants of ISAs, such as Thumb,   
   > Thumb2, MIPS16e, microMIPS, or the "C" option of RISC-V, where you can   
   > compare with/without on the very same machine since all the half-size   
   > instructions are also available in full-size.   
   >   
      
   Yep.   
      
      
   >> Code segment size is much easier to measure.   
   >   
   > Yes, but!   
   >   
      
   Code-size conflates several desirable properties:   
      Space saving, reducing instruction counts, etc.   
      
   But, in so doing, loses distinctiveness:   
   Is the binary smaller due to a smaller number of bigger instructions, or   
   a larger number of smaller instructions?...   
      
   Smaller binary can be good, but a larger number of smaller instructions   
   is less so.   
      
      
   Like, say:   
      Doom compiled to XG3 vs Doom compiled to SH-4...   
        The size of ".text" isn't that much different.   
        But, the SH-4 version has around 230% as many instructions.   
          So, would perform significantly worse.   
      
      
   Actually, kinda of funny the path I took (nearly a decade thus far):   
      Started out with 16-bit instructions and 16 registers (and SH-4);   
      Then went to 16/32 (BJX1-32);   
      Then went 64-bit (at which, first attempt was a horrible mess);   
      Then "simplified" it (BJX1-64C)   
         (clean-ups and dropping stuff to free encoding space);   
      Then creating a minimalist version of the 64-bit ISA (B64V)   
        back to fixed-length 16-bit instructions;   
      Then made a 32-bit version (B32V),   
        and reworked the encoding (BTSR1);   
      Then made it 64-bit again (BJX2);   
      Re-added 32-bit instructions, but different this time;   
        16-bit encodings, R0..R15, 32-bit R0..R31   
      Then added 48 bit encodings;   
      Added explicit parallelism (WEX) and Predication;   
        Was encoded by 2 bits in each instruction;   
      Then ended up making 32-bit encodings primary, rather than 16-bit;   
      Then added jumbo prefixes, dropping the 48 bit encodings;   
      Then added SIMD;   
      Then started expanding to 64 GPRs (XGPR);   
      Then added RISC-V decoder support;   
      Created an ISA variant that goes fully 64 GPR (XG2),   
        at expense of 16-bit ops;   
        In basic cases, 32-bit encodings are common with its predecessor.   
          But, some bit-twiddly dog-chew.   
      Starts to note RISC-V and GCC are not a "silver bullet"   
        Seemingly RV+GCC doing well being Dhrystone;   
        Ported a few RV features to my ISA, to solidly regain perf lead.   
        But, RV still has some merits, even if not the best perf.   
      Makes my compiler target RISC-V as well;   
        Experiments with some extensions, improving perf.   
        Tried gluing a lot of features from my ISA onto RISC-V;   
          Excluding predication, no real way to make this work as-is.   
      Makes a new ISA variant that glues both ISAs together (XG3),   
        in the same encoding space, sacrificing WEX.   
        Predication can still be encoded, but demoted to optional.   
          Depends on arch state that doesn't formally exist in RV.   
      
      
   Then, say, XG3 is pretty much unrecognizable if compared with SH-4.   
      
   Say:   
      SH-4:   
        16-bit instructions, 16 registers, 32-bit word size;   
      XG3:   
        32/64/96 bit instructions;   
        64 registers;   
        64-bit word size.   
      
   Register Space:   
      SH-4:   
        R0..R3: Scratch   
        R4..R7: Arg1..Arg4 / Scratch   
        R8..R14: Callee Save   
        R15: SP   
        Prototypical instruction: zzzz-nnnn-mmmm-zzzz   
      XG2:   
        R0 / R1: Dedicated Stomp Regs   
        R2 / R3: Scratch   
        R4..R7: Arg1..Arg4 / Scratch   
        R8..R14: Callee Save   
        R15: SP   
        R16..R19: Scratch   
        R20..R23: Arg5..Arg8 / Scratch   
        R24..R31: Callee Save   
        R32..R35: Scratch   
        R36..R39: Arg9..Arg12 / Scratch   
        R40..R47: Callee Save   
        R48..R51: Scratch   
        R52..R55: Arg13..Arg16 / Scratch   
        R56..R63: Callee Save   
        Prototypical instruction: NMOP-xwxx-nnnn-mmmm,yyyy-qnmo-oooo-zzzz   
      XG3 (and RV):   
        R0: ZR  / Zero   
        R1: LR  / RA   
        R2: SP   
        R3: GBR / GP   
        R4: TP   
        R5..R7: De-Facto Stomp   
        R8/R9: Callee Save   
        R10..R17: Arg1..Arg8 / Scratch   
        R18..R27: Callee Save   
        R28..R31: Scratch   
        R32..R63 == F0..F31   
          F0..F3: Stomp or Scratch (Stomp for RV)   
          F4..F7: Scratch or Callee Save (ABI)   
          F8/F9: Callee Save   
          F10..F17: Scratch or Arg9..Arg16 (ABI)   
          F18..F27: Callee Save   
          F28..F31: Scratch   
        Prototypical instruction:   
          XG3: zzzzoooooommmmmmyyyynnnnnnqxxx10   
          RV : zzzzzzzooooommmmmyyynnnnnxxxxx11   
      
      
   The stomp regs are functionally scratch registers, but may not be used   
   by the main part of the compiler to hold live values, as they are   
   reserved for the assembler stage to be able to stomp them without   
   warning when synthesizing pseudo-instructions.   
      
   In the transition from SH-4 to what became BJX2, the functionality of   
   MACL/MACH and PTEL/PTEH and similar was all folded into R0 and R1, which   
   were given the names DLR and DHR.   
      
   Also renamed PR to LR (basically the same as RV's RA).   
   Functionally, PR/RA/LR are all treated as aliases for the same register.   
   I used LR as pretty much everything calls it a "Link Register" so no   
   obvious reason IMO to not call it LR.   
      
      
   Despite looking very different, XG3 instructions are mostly the same as   
   XG2 instructions but with a lot of the bits moved around and some other   
   fairly modest tweaks to the decoding rules (mostly to make encoding   
   rules appear consistent across instruction types). The new layout was   
   also made to look more cohesive with RV's layout, even if the fields are   
   in different places and different sizes.   
      
   Can note that XG2 and XG3 have fewer opcode bits than RISC-V, but   
   seemingly I didn't burn through encoding space at quite the same rate.   
      
   Though, unlike RISC-V, I also have a big pile of 2R instructions (which   
   use comparably less encoding space).   
      
   ...   
      
      
   >   
   >          Stefan   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]