Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 129,364 of 131,241    |
|    BGB to John Savard    |
|    Re: Pseudo-Immediates as Part of the Ins    |
|    10 Aug 25 18:59:29    |
      From: cr88192@gmail.com              On 8/10/2025 1:07 PM, John Savard wrote:       > On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:       >       >> That said, a lot of John's other ideas come off to me like straight up       >> absurdity. So, I wouldn't hold up much hope personally for it to turn       >> into much usable.       >       > While I think that not being able to be put to use isn't really one of the       > faults of the Concertina II ISA, the block structure, especially at its       > current level of complexity, is going to come across as quite weird to       > many, and I don't yet see any hope of achieving a drastic simplification       > in that area.       >              OK.              I judge things here by a few criteria:        Could be affordably implemented in hardware;        Would be usable and useful;        Mostly makes sense in terms of relative cost/benefit tradeoffs.              I am a little more pessimistic on things that I don't really feel       satisfy the above constraints.              For comparison, RISC-V mostly satisfies the above, although:        Many of the extensions are weaker on these points;        Some of the encodings, and the 'C' extension in general,        are badly dog chewed.                     Then again, my ISA has potentially ended up with an excess of niche-case       format converter instructions and similar.                     > Each of the sixteen block types serves one or another functionality which       > I see as necessary to give this ISA the breadth of application that I have       > as my goal.       >              Many make it work with plain 32-bit or 16/32 encodings.              Granted, I have ended up with more:        16/32/64/96, depending on ISA.        XG1, 16/32/64/96        XG2, 32/64/96        XG3, 32/64/96 (32/64 for RV ops)        RV, 16/32/(48)/64                     Apparently, Huawei and similar have some 48-bit encodings defined for       RV64. In my sensibilities, 48-bit only makes sense if one is already       committed to 16 bit ops, but given how quickly they burnt through the       encoding space; practically the 48-bit space would just end up being a       space-saving subset of the 64-bit space (in my experimental attempt to       deal with the 48-bit encodings, they were unpacked temporarily into the       64-bit encoding space).              Basically, they burnt through most of the 48-bit encoding space with a       handful of Imm32 and a few Disp32 ops. If it were me I would have gone       for Imm24 ops and had a little more encoding space left over.              Did experimentally mock up a 48-bit scheme that did basically extend the       32-bit space to have Imm24 (adding 12 bits to each Imm/Disp for all the       Imm12/Disp12 ops), but it was a little dog chewed. Could potentially       lead alternate encodings for Imm32 constant load and Disp32 branch (by       adding 12 bits to LUI and JAL).              One can argue though, which would they rather have:        Pretty much all of the 32-bit immediate forms extended to 24 bits;        Or, 32-bits immediate values,        but only for a very limited range of ops.              Though, I suspect for general use, extending the whole ISA to 24 bits       might be "better" for average case code density (with 64-bit encodings       for cases when one needs Imm32).              Then again, I am on the fence about 48 bit encodings in general:       Helps code density;       Hurts performance for a cheap core;       Say, if one doesn't want to spend the cost of dealing with superscalar       for misaligned instructions and 16 bit ops (doing so would add       significant resource cost).                            I did experiment with adding the C extension to BGBCC, and RV64GC+Jumbo       can seemingly get decent code density.              Granted, both are mostly similar here, both using 5-bit register fields.        Though, XG1 16-bit ops mostly have access to 16 registers;        And, RV-C ops mostly are a mix of 8 and 32 registers.              Did experiment with a pair encoding for XG3 (X3C), which doesn't match       either XG1 or RV64GC+Jumbo in terms of code density. But not too far off.              At the moment (Doom ".text" size, static-linked C library):        XG1: 275K        XG2: 290K        RV64GC+Jumbo: 295K (vs 350K RV+Jumbo, or 370K RV64GC)        XG3+X3C: 305K (vs 320K)              Granted, XG3 isn't designed for maximum code density, rather performance       and being able to merge with RV64G.              It is unclear if the improvement in code density (of X3C) would be worth       the added decoder cost (and doesn't fit in with the existing decoder       paths for XG1 or RVC; so would need something new/wacky to deal with it).              Though, could deal with it (in the core) in a similar way to how I dealt       with 48-bit ops, namely unpacking it to a 64-bit form (two instructions)       after fetch.                     In theory, XG3 should be able to match XG2 code density as there isn't       really anything that XG2 has that XG3 lacks that would significantly       effect code density. XG3 did drop the 2RI-Imm10 ops, but these had       largely become redundant. So, the main difference is likely related to       BGBCC itself, which is mostly treating XG3 as an extension of its RV64G       mode (which "suffers" slightly by having less usable callee save       registers in the ABI, and fewer register arguments; but had on/off       considered tweaking the ABI here).              Though, if XG3 did match XG2 code density, X3C could potentially also       reduce it to 275K.              But, could just focus more on RV64GC here, as I sorta already needed it,       and recently found/fixed a bug in the decoder in my CPU core that was       stopping the 'C' extension from working (so now it seems to work).                     Though, to recap (X3C):        X3C packs a 13 and 14 bit instruction together into a 32 bit word;        Which serves a similar purpose to RVC;        Though only allows instruction pairs which can safely co-execute.        Instructions encode:        MOV/ADD Rm5, Rn5        LI/ADD/ADDW Imm5s, Rn5        SUB/ADDW/ADDWU/AND/OR/XOR Rm3, Rn3        SLL/SRL/SRA Rm3, Rn3        SLLW/SRLW/SLAW/SRAW Rm3, Rn3        SLL/SRL/SRA Imm3, Rn3        SLLW/SRLW/SLAW/SRAW Imm3, Rn3        And, for the 14-bit case:        LD/SD/LW/SW Rn5, Disp5(SP)        LD/SD/LW/SW Rn3, Disp2(Rm3)        LB/LBU/LH/LHU Rn3, 0(Rm3)        SB/SH Rn3, 0(Rm3)              X3C was put into a hole in the encoding space that previously held the       PrWEX space (in XG1/XG2), but PrWEX is N/A in XG3. The WEX space is N/A       (used for RV encodings, and the large-constant instruction was replaced       with the XG3's Jumbo Prefix). Granted, the scope of X3C is more limited       than that of RV-C.                     > But I have introduced "scaled displacements" back in, allowing the       > augmented short instruction mode instruction set to be more powerful.       >              OK.              Yeah, scaled displacements make sense.                     Ironically, another one of my complaints about RVC is that while they       saved bits in the displacements, rather than doing something sane like       changing scale based on type, they bit-sliced the displacements based on       type in a way that means it effectively has unique displacement       encodings for:              [continued in next message]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca