... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,693 of 131,241
BGB to Anton Ertl
Re: Variable-length instructions (2/2)
30 Dec 25 14:50:04
   [continued from previous message]   
      
   Potentially, this could be reduced to 3 bits, say:   
      000: 16 bit   
      001: 32 bit   
      010: 48 bit   
      011: 64 bit   
      100: 80 bit   
      101: 96 bit   
      110: 128 bit   
      111: 192 bit   
      
   Though, the actual output from the I$ is expressed as a length in a   
   linear multiple of 16 bits.   
      
      
   But, the existing logic was based on flags rather than explicitly   
   storing the size, and possibly the distinction between 2 or 3 wide ops,   
   and jumbo-prefixed forms, does not matter past the fetch length   
   determination logic.   
      
   Further down the path, the decoder sees that it was handed 2 or 3   
   instructions, and deals with this. Originally, it did its own redundant   
   length determination. I have now switched to driving this logic entirely   
   off the fetch-length given from the I$ (and captured pipeline state for   
   which ISA mode is in effect and similar).   
      
   Well, and with the further simplification that now the logic paths for   
   jumbo 96 handling and 3-wide fetch have "actually" been merged.   
      
      
   There is still the wonk that the decoder needs to re-route signals to   
   lanes differently depending on fetch width.   
      
   If I were doing a new core (or a possible later rework) it would make   
   mode sense to have the IF stage right-align the fetch.   
      
   Or, Say:   
      Op16               =>   -     -     Op16A (Repack?)   
      Op32               =>   -     -     Op32A   
      Op48               =>   -     Op48B Op48A (Repack)   
      Op32B Op32A        =>   -     Op32B Op32A   
      Op32C Op32B Op32A  =>   Op32C Op32B Op32A   
      
      
   Or, maybe one could make a case that I have done this stuff backwards   
   and Lane1 should come before 2 and 3 rather than after, but alas. For   
   dealing with prefix decoding though, etc, it makes sense that Lane 1   
   should always be the last instruction in the fetch, rather than the first.   
      
   One would argue that maybe prefixes are themselves wonky, but otherwise   
   one needs:   
   Instructions that can directly encode the presence of large immediate   
   values, etc;   
   Or, the use of suffix-encodings (which is IMHO worse than prefix   
   encodings; at least prefix encodings make intuitive sense if one views   
   the instruction stream as linear, whereas suffixes add weirdness and are   
   effectively retro-causal, and for any fetch to be safe at the end of a   
   cache line one would need to prove the non-existence of a suffix; so   
   better to not go there).   
      
      
   For the most part, superscalar works the same either way, with similar   
   efficiency. There is a slight efficiency boost if it would be possible   
   to dynamically reshuffle ops during fetch. But, this is not currently a   
   thing in my case.   
      
   This latter case would apply if, say, a MEM op is followed by   
   non-dependent ALU ops, which under current superscalar handling they   
   will not co-execute, but it could be possible in theory to swap the ops   
   and allow them to co-execute.   
      
      
   ...   
      
      
   > - anton   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]