... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,364 of 131,241
BGB to John Savard
Re: Pseudo-Immediates as Part of the Ins
10 Aug 25 18:59:29
   From: cr88192@gmail.com   
      
   On 8/10/2025 1:07 PM, John Savard wrote:   
   > On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:   
   >   
   >> That said, a lot of John's other ideas come off to me like straight up   
   >> absurdity. So, I wouldn't hold up much hope personally for it to turn   
   >> into much usable.   
   >   
   > While I think that not being able to be put to use isn't really one of the   
   > faults of the Concertina II ISA, the block structure, especially at its   
   > current level of complexity, is going to come across as quite weird to   
   > many, and I don't yet see any hope of achieving a drastic simplification   
   > in that area.   
   >   
      
   OK.   
      
   I judge things here by a few criteria:   
      Could be affordably implemented in hardware;   
      Would be usable and useful;   
      Mostly makes sense in terms of relative cost/benefit tradeoffs.   
      
   I am a little more pessimistic on things that I don't really feel   
   satisfy the above constraints.   
      
   For comparison, RISC-V mostly satisfies the above, although:   
      Many of the extensions are weaker on these points;   
      Some of the encodings, and the 'C' extension in general,   
        are badly dog chewed.   
      
      
   Then again, my ISA has potentially ended up with an excess of niche-case   
   format converter instructions and similar.   
      
      
   > Each of the sixteen block types serves one or another functionality which   
   > I see as necessary to give this ISA the breadth of application that I have   
   > as my goal.   
   >   
      
   Many make it work with plain 32-bit or 16/32 encodings.   
      
   Granted, I have ended up with more:   
      16/32/64/96, depending on ISA.   
        XG1, 16/32/64/96   
        XG2, 32/64/96   
        XG3, 32/64/96 (32/64 for RV ops)   
        RV, 16/32/(48)/64   
      
      
   Apparently, Huawei and similar have some 48-bit encodings defined for   
   RV64. In my sensibilities, 48-bit only makes sense if one is already   
   committed to 16 bit ops, but given how quickly they burnt through the   
   encoding space; practically the 48-bit space would just end up being a   
   space-saving subset of the 64-bit space (in my experimental attempt to   
   deal with the 48-bit encodings, they were unpacked temporarily into the   
   64-bit encoding space).   
      
   Basically, they burnt through most of the 48-bit encoding space with a   
   handful of Imm32 and a few Disp32 ops. If it were me I would have gone   
   for Imm24 ops and had a little more encoding space left over.   
      
   Did experimentally mock up a 48-bit scheme that did basically extend the   
   32-bit space to have Imm24 (adding 12 bits to each Imm/Disp for all the   
   Imm12/Disp12 ops), but it was a little dog chewed. Could potentially   
   lead alternate encodings for Imm32 constant load and Disp32 branch (by   
   adding 12 bits to LUI and JAL).   
      
   One can argue though, which would they rather have:   
      Pretty much all of the 32-bit immediate forms extended to 24 bits;   
      Or, 32-bits immediate values,   
        but only for a very limited range of ops.   
      
   Though, I suspect for general use, extending the whole ISA to 24 bits   
   might be "better" for average case code density (with 64-bit encodings   
   for cases when one needs Imm32).   
      
   Then again, I am on the fence about 48 bit encodings in general:   
   Helps code density;   
   Hurts performance for a cheap core;   
   Say, if one doesn't want to spend the cost of dealing with superscalar   
   for misaligned instructions and 16 bit ops (doing so would add   
   significant resource cost).   
      
      
      
   I did experiment with adding the C extension to BGBCC, and RV64GC+Jumbo   
   can seemingly get decent code density.   
      
   Granted, both are mostly similar here, both using 5-bit register fields.   
      Though, XG1 16-bit ops mostly have access to 16 registers;   
      And, RV-C ops mostly are a mix of 8 and 32 registers.   
      
   Did experiment with a pair encoding for XG3 (X3C), which doesn't match   
   either XG1 or RV64GC+Jumbo in terms of code density. But not too far off.   
      
   At the moment (Doom ".text" size, static-linked C library):   
      XG1:          275K   
      XG2:          290K   
      RV64GC+Jumbo: 295K (vs 350K RV+Jumbo, or 370K RV64GC)   
      XG3+X3C:      305K (vs 320K)   
      
   Granted, XG3 isn't designed for maximum code density, rather performance   
   and being able to merge with RV64G.   
      
   It is unclear if the improvement in code density (of X3C) would be worth   
   the added decoder cost (and doesn't fit in with the existing decoder   
   paths for XG1 or RVC; so would need something new/wacky to deal with it).   
      
   Though, could deal with it (in the core) in a similar way to how I dealt   
   with 48-bit ops, namely unpacking it to a 64-bit form (two instructions)   
   after fetch.   
      
      
   In theory, XG3 should be able to match XG2 code density as there isn't   
   really anything that XG2 has that XG3 lacks that would significantly   
   effect code density. XG3 did drop the 2RI-Imm10 ops, but these had   
   largely become redundant. So, the main difference is likely related to   
   BGBCC itself, which is mostly treating XG3 as an extension of its RV64G   
   mode (which "suffers" slightly by having less usable callee save   
   registers in the ABI, and fewer register arguments; but had on/off   
   considered tweaking the ABI here).   
      
   Though, if XG3 did match XG2 code density, X3C could potentially also   
   reduce it to 275K.   
      
   But, could just focus more on RV64GC here, as I sorta already needed it,   
   and recently found/fixed a bug in the decoder in my CPU core that was   
   stopping the 'C' extension from working (so now it seems to work).   
      
      
   Though, to recap (X3C):   
      X3C packs a 13 and 14 bit instruction together into a 32 bit word;   
      Which serves a similar purpose to RVC;   
      Though only allows instruction pairs which can safely co-execute.   
      Instructions encode:   
        MOV/ADD      Rm5, Rn5   
        LI/ADD/ADDW  Imm5s, Rn5   
        SUB/ADDW/ADDWU/AND/OR/XOR  Rm3, Rn3   
        SLL/SRL/SRA          Rm3, Rn3   
        SLLW/SRLW/SLAW/SRAW  Rm3, Rn3   
        SLL/SRL/SRA          Imm3, Rn3   
        SLLW/SRLW/SLAW/SRAW  Imm3, Rn3   
      And, for the 14-bit case:   
        LD/SD/LW/SW    Rn5, Disp5(SP)   
        LD/SD/LW/SW    Rn3, Disp2(Rm3)   
        LB/LBU/LH/LHU  Rn3, 0(Rm3)   
        SB/SH          Rn3, 0(Rm3)   
      
   X3C was put into a hole in the encoding space that previously held the   
   PrWEX space (in XG1/XG2), but PrWEX is N/A in XG3. The WEX space is N/A   
   (used for RV encodings, and the large-constant instruction was replaced   
   with the XG3's Jumbo Prefix). Granted, the scope of X3C is more limited   
   than that of RV-C.   
      
      
   > But I have introduced "scaled displacements" back in, allowing the   
   > augmented short instruction mode instruction set to be more powerful.   
   >   
      
   OK.   
      
   Yeah, scaled displacements make sense.   
      
      
   Ironically, another one of my complaints about RVC is that while they   
   saved bits in the displacements, rather than doing something sane like   
   changing scale based on type, they bit-sliced the displacements based on   
   type in a way that means it effectively has unique displacement   
   encodings for:   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]