Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 130,693 of 131,241    |
|    BGB to Anton Ertl    |
|    Re: Variable-length instructions (2/2)    |
|    30 Dec 25 14:50:04    |
      [continued from previous message]              Potentially, this could be reduced to 3 bits, say:        000: 16 bit        001: 32 bit        010: 48 bit        011: 64 bit        100: 80 bit        101: 96 bit        110: 128 bit        111: 192 bit              Though, the actual output from the I$ is expressed as a length in a       linear multiple of 16 bits.                     But, the existing logic was based on flags rather than explicitly       storing the size, and possibly the distinction between 2 or 3 wide ops,       and jumbo-prefixed forms, does not matter past the fetch length       determination logic.              Further down the path, the decoder sees that it was handed 2 or 3       instructions, and deals with this. Originally, it did its own redundant       length determination. I have now switched to driving this logic entirely       off the fetch-length given from the I$ (and captured pipeline state for       which ISA mode is in effect and similar).              Well, and with the further simplification that now the logic paths for       jumbo 96 handling and 3-wide fetch have "actually" been merged.                     There is still the wonk that the decoder needs to re-route signals to       lanes differently depending on fetch width.              If I were doing a new core (or a possible later rework) it would make       mode sense to have the IF stage right-align the fetch.              Or, Say:        Op16 => - - Op16A (Repack?)        Op32 => - - Op32A        Op48 => - Op48B Op48A (Repack)        Op32B Op32A => - Op32B Op32A        Op32C Op32B Op32A => Op32C Op32B Op32A                     Or, maybe one could make a case that I have done this stuff backwards       and Lane1 should come before 2 and 3 rather than after, but alas. For       dealing with prefix decoding though, etc, it makes sense that Lane 1       should always be the last instruction in the fetch, rather than the first.              One would argue that maybe prefixes are themselves wonky, but otherwise       one needs:       Instructions that can directly encode the presence of large immediate       values, etc;       Or, the use of suffix-encodings (which is IMHO worse than prefix       encodings; at least prefix encodings make intuitive sense if one views       the instruction stream as linear, whereas suffixes add weirdness and are       effectively retro-causal, and for any fetch to be safe at the end of a       cache line one would need to prove the non-existence of a suffix; so       better to not go there).                     For the most part, superscalar works the same either way, with similar       efficiency. There is a slight efficiency boost if it would be possible       to dynamically reshuffle ops during fetch. But, this is not currently a       thing in my case.              This latter case would apply if, say, a MEM op is followed by       non-dependent ALU ops, which under current superscalar handling they       will not co-execute, but it could be possible in theory to swap the ops       and allow them to co-execute.                     ...                     > - anton              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca