... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,328 of 131,241
BGB to MitchAlsup
Re: Tonights Tradeoff
19 Nov 25 12:53:35
   From: cr88192@gmail.com   
      
   On 11/13/2025 9:59 PM, MitchAlsup wrote:   
   >   
   > BGB  posted:   
   >   
   >> On 11/13/2025 3:58 PM, Anton Ertl wrote:   
   >>> BGB  writes:   
   >>>> Can note that GCC seemingly doesn't support 128-bit integers on 64-bit   
   >>>> RISC-V.   
   >>>   
   >>> What makes you think so?  It has certainly worked every time I tried   
   >>> it.  E.g., Gforth's "configure" reports:   
   >>>   
   >>> checking size of __int128_t... 16   
   >>> checking size of __uint128_t... 16   
   >>> [...]   
   >>> checking for a C type for double-cells... __int128_t   
   >>> checking for a C type for unsigned double-cells... __uint128_t   
   >>>   
   >>> That's with gcc 10.3.1   
   >>>   
   >>   
   >> Hmm...   
   >>   
   >> Seems so.   
   >>   
   >> Testing again, it does appear to work; the error message I thought I   
   >> remembered seeing, instead applied to when trying to use the type in   
   >> MSVC. I had thought I remembered checking before and it failing, but it   
   >> seems not.   
   >>   
   >> But, yeah, good to know I guess.   
   >>   
   >>   
   >> As for MSVC:   
   >> tst_int128.c(5): error C4235: nonstandard extension used: '__int128'   
   >> keyword not supported on this architecture   
   >   
   > ERRRRRRR:: not supported by this compiler, the architecture has   
   > ISA level support for doing this, but the compiler does not allow   
   > you access.   
      
   More or less it seems.   
      
      
   This leaves, apparently:   
      MSVC: Maybe once had it for IA-64, but nowhere else;   
      GCC: Supported, but lacks a printf modifier for it in glibc.   
      Clang: Supported, but lacks support for 128-bit integer literals?...   
      BGBCC: Supported, with literals and 'I128' printf modifier.   
        Where, 'I128' is similar to 'I64' in MSVC,   
          as for a long time they also lacked the 'll' modifier and similar.   
      
   ISA's:   
   X64: Can build manually via register pairs (any two registers), ADD+ADC   
   allows for 128-bit in 2 instructions;   
   Many 128-bit ops can be built using flags bits;   
   ISA supports widening multiply and narrowing divide, though typically   
   with hardwired registers.   
      
   XG1/XG2:   
      CLRT+ADDC+ADDC   
        Theoretically arbitrary, BGBCC only uses even pairs;   
        CLRT needed to clear the SR.T flag;   
          Normal ADD does not modify SR.T.   
          Could maybe be better if there were a 3R ADDC variant,   
          and maybe a carry-out only variant (so no CLRT was needed).   
      ADDX   
        Even pairs only, single instruction.   
      
   XG3:   
      Support for SR.T was demoted to optional,   
      half the encoding space goes unused if predication isn't used though.   
        Could bit a "better" RV-C in there (*1).   
      ALUX instructions could be used, also optional.   
        Otherwise, it is left in a similar situation to RISC-V here.   
      
   *1: Noted before that if one tweaks the design of RV-C some:   
      Makes Imm/Disp fields smaller;   
      Replaces Reg3 with Reg4 (X8..X27);   
      ...   
   It is possible to get an set of 16-bit ops that both use less encoding   
   space and get a better average hit rate than the existing RV-C ops   
   (mostly by not trying to do Imm6/Disp6 in said ops; and only using Reg5   
   on a few instructions).   
      
   However, IMO, makes more sense to support RV-C for binary compatibility,   
   than for the encoding scheme not being "kind of a turd".   
      
   However, "XG3 sub-variant that drops predicated encodings in favor of   
   re-adding a new/different set of 16-bit encodings" was not a   
   particularly attractive option.   
      
      
   For where it makes sense to use XG3 though, likely it makes sense to   
   allow/use SR.T and the predicated encodings, which can still offer a   
   small but non-zero performance benefit (even if debatable if it is   
   something that is worth spending half of the encoding space on).   
      
   I did also experimenting with allowing a few blocks to be used for   
   pair-encoded ops. One other possibility could be some additional   
   unconditional-only instruction blocks (but, these would be N/E in XG1/XG2).   
      
      
      
   One possibility could also be an "XG3 Lite" subset:   
      Likely unconditional only, and also disallows RISC-V encodings.   
      
   Or, IOW:   
      ...xx00  Disallowed   
      ...xx01  Disallowed   
      ...xx10  Allowed   
      ...xx11  Disallowed   
      
   Could maybe make sense if I wanted a core on a smaller FPGA.   
      
   However, there isn't that much incentive to go for much smaller than the   
   XC7S50 with this, and for current use-cases that could involve an XC7S25   
   or XC7A35T, you kinda really want to try to maximize code density   
   (mostly because the currently available dev-boards with these FPGAs tend   
   to lack external RAM).   
      
   The Intel/Altera chips tend to always have integrated ARM cores;   
   Boards with Lattice FPGAs (probably ECP5 or similar in this case, *)   
   tend to be obscure and overpriced (even if theoretically the FPGAs   
   themselves are cheaper).   
      
   *: One is harder pressed to make a non-trivial CPU core that fits into   
   an ICE40.   
      
      
   Though, one other possibility being trying to again implement dual-core   
   on an XC7A100T, but possibly sharing FPU and SIMD between the cores (may   
   or may not be viable).   
      
   In this case, there would be a mechanism such that inter-core interlocks   
   could trigger to disallow both cores trying to access the FPU or SIMD   
   unit on the same clock-cycle. Though unclear how this could interact   
   with pipeline stalls (would ideally want both cores to have independent   
   pipelines; but then one needs to arbitrate things such that both units   
   get their results at the expected clock cycle, ...).   
      
   Though, to that end, may also make sense to consider going to a   
   dual-issue superscalar with 4R2W register file.   
      
   ...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]