From: robfi680@gmail.com   
      
   On 2026-01-07 2:22 a.m., BGB wrote:   
   > On 1/6/2026 5:49 PM, MitchAlsup wrote:   
   >>   
   >> Robert Finch posted:   
   >>   
   >>>    
   >>>   
   >>>> One would argue that maybe prefixes are themselves wonky, but otherwise   
   >>>> one needs:   
   >>>> Instructions that can directly encode the presence of large immediate   
   >>>> values, etc;   
   >>   
   >> This is the direction of My 66000.   
   >>   
   >> The instruction stream is a linear stream of words.   
   >> The first word of each instruction encodes its total length.   
   >> What follows the instruction itself are merely constants used as   
   >> operands in the instruction itself. All constants are 1 or 2   
   >> words in length.   
   >>   
   >> I would not call this means "prefixed" or "suffixed". Generally,   
   >> prefixes and suffixes consume bits of the prefix/suffix so that   
   >> the constant (in my case) is not equal to container size. This   
   >> leads to wonky operand/displacement sizes not equal 2^(3+k).   
   >>   
   >   
   > OK.   
   >   
   > As can be noted:   
   > XG2/3: Prefix scheme, 1/2/3 x 32-bit   
   > The 96-bit cases are determined by two prefixes.   
   > Requires looking at 2 words to know total length.   
   > RV64+Jx:   
   > Total length is known from the first instruction word:   
   > Base op: 32 bits;   
   > J21I: 64 bits   
   > J52I: 96 bits.   
   > There was a J22+J22+LUI special case,   
   > but I now consider this as deprecated.   
   > J52I+ADDI is now considered preferable.   
   >   
   > As for Imm/Disp sizes:   
   > XG1: 9/33/57   
   > XG2 and XG3: 10/33/64   
   > RV+JX: 12/33/64   
   >   
   > For XG1, the 57-bit size was rarely used and only optionally supported,   
   > mostly because of the great "crap all of immediate values between 34 and   
   > 62 bits" gulf.   
   >   
   >   
   >>>> Or, the use of suffix-encodings (which is IMHO worse than prefix   
   >>>> encodings; at least prefix encodings make intuitive sense if one views   
   >>>> the instruction stream as linear, whereas suffixes add weirdness and   
   >>>> are   
   >>>> effectively retro-causal, and for any fetch to be safe at the end of a   
   >>>> cache line one would need to prove the non-existence of a suffix; so   
   >>>> better to not go there).   
   >>>>   
   >>> I agree with this. Prefixes seem more natural, large numbers expanding   
   >>> to the left, suffixes seem like a big-endian approach. But I use   
   >>> suffixes for large constants. I think with most VLI constant data   
   >>> follows the instruction.   
   >>   
   >> But not "self identified".   
   >>   
   >   
   > Yeah, if you can't know whether or not more instruction follows after   
   > the first word by looking at the first word, this is a drawback.   
   >   
   I do not find having to look at the second word much of a drawback.   
   There is not much difference looking at either the first or second word.   
   The words are all sitting available on the cache-line.   
   Large constants are treated as more of the exceptional case in Qupls4.   
   The immediate mode instructions can handle 28-bit constants. One suffix   
   expands that out to 64-bits.   
   By placing the constant info in the suffix, Qupls4 gains bits in the   
   instruction that can be used for other purposes rather than handling   
   large constants.   
   Using a suffix (or prefix) does lead to odd sized constants, but as long   
   as they are large enough so what.   
      
   > Also, if you have to look at some special combination of register   
   > specifiers and/or a lot of other bits, this is also a problem.   
   >   
   I do not know. It depends how it is handled. Qups4 decodes r63 as the   
   constant zero, so it is a special register spec, like r0 in many   
   machines. r62 gets decoded as the IP value.   
   I think the constant decode is not likely on the timing critical path   
   provided it is semi-sane.   
   Currently on the timing path for Qupls4 is expanding out instructions to   
   multiple micro-ops. I think it needs another pipeline stage.   
      
   >   
   >>> I find constant data   
   easier to work with that   
   >>> way and they can be processed in the same clock cycle as a decode so   
   >>> they do not add to the dynamic instruction count. Just pass the current   
   >>> instruction slot plus a following area of the cache-line to the decoder.   
   >>>   
   >>> Handling suffixes at the end of a cache-line is not too bad if the cache   
   >>> already handles instructions spanning a cache line. Assume the maximum   
   >>> number of suffixes is present and ensure the cache-line is wide enough.   
   >>> Or limit the number of suffixes so they fit into the half cache-line   
   >>> used for spanning.   
   >>>   
   >>> It is easier to handle interrupts with suffixes. The suffix can just be   
   >>> treated as a NOP. Adjusting the position of the hardware interrupt to   
   >>> the start of an instruction then does not have to worry about accounting   
   >>> for a prefix / suffix.   
   >>   
   >> I would have thought that the previous instruction (last one retired)   
   >> would   
   >> provide the starting point of the subsequent instruction. This way you   
   >> don't   
   >> have to worry about counting prefixes or suffixes.   
   >>   
   >   
   > Yeah.   
   >   
   > My thinking is, typical advance:   
   > IF figures out how much to advance;   
   > Next instruction gets PC+Step.   
   >   
   Qupls4 does not bother figuring out how much to advance; it would be too   
   slow. It just assumes an increment. Why figure it out? If there are   
   instructions in a bundle just advance to the next bundle. I found the IP   
   selection on the timing critical path, the BTB had to be adjusted.   
      
   > Then interrupt:   
   > Figure out which position in the pipeline interrupt starts from;   
   > Start there, flushing the rest of the pipeline;   
   > For a faulting instruction, this is typically the EX1 or EX2 stage.   
   > EX1 if it is a TRAP or SYSCALL;   
   > EX2 if it is a TLB miss or similar;   
   > Unless EX2 is not a valid spot (flush or bubble),   
   > then look for a spot that is not a flush or bubble.   
   > This case usually happens for branch-related TLB misses.]   
   >   
   > Usually EX3 or WB is too old, as it would mean re-running previous   
   > instructions.   
   >   
   > Getting the exact stage-timing correct for interrupts is a little   
   > fiddly, but worrying about prefix/suffix/etc issues with interrupts   
   > isn't usually an issue, except that if somehow PC ended up pointing   
   > inside another instruction, I would consider this a fault.   
   >   
   > Usually for sake of branch-calculations in XG3 and RV, it is relative to   
   > the BasePC before the prefix in the case of prefixed encodings. This   
   > differs from XG1 and XG2 which defined branches relative to the PC of   
   > the following instruction.   
   >   
   > Though, this difference was partly due to a combination of   
   > implementation reasons and for consistency with RISC-V (when using a   
   > shared encoding space, makes sense if all the branches define PC   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|