home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 130,747 of 131,241   
   MitchAlsup to All   
   Re: Variable-length instructions   
   03 Jan 26 23:09:37   
   
   From: user5857@newsgrouper.org.invalid   
      
   Robert Finch  posted:   
      
   > On 2026-01-02 9:05 p.m., MitchAlsup wrote:   
   ----------merciful snip-----------   
      
   > >> Looks over a sliding window of 10 or 12 instructions:   
   > >>     4 preceding instructions (-4 to -1);   
   > >       4 new instructions on previous predicted path (0 to 3);   
   > >       4 alternate instructions on current predicted path   
   > > // so one can decode and issue non-sequential instructions   
   >   
   > They could have put which GPR(s) is the link register in a CSR, if it   
   > was desired to keep the paradigm of generality. I started working on   
   > Qupls5 which is going to use a 32-bit ISA. The extra bits used to   
   > specify a GPR as a link register are better used as branch displacement   
   > bits IMO. I would be tempted to use two bits though to specify the LR,   
   > as sometimes a second LR is handy.   
      
   In my opinion, you are correct, more displacement is a lot better than   
   being able to specify a GPR. RISC-V did the specification so that   
   Epilogue could call register save routine and Prologue could call   
   register reload subroutine. In stead, what they should have done is   
   build a small register shuffling state machine between register file   
   and cache.   
      
   > A choice is whether to use GPRs as link registers. Not using a GPR gives   
   > an extra register or two for GPR use. Using dedicated link register(s)   
   > works well with a dedicated RET instruction. RET should be able to   
   > deallocate the stack. IMO using a dedicated link register is a bit like   
   > using an independent PC register. Or using a GPR for the link register   
   > is a bit like using a GPR for the PC.   
      
   I went a tad further:: EXIT restores the preserved registers and   
    transfers control back following call.   
   The LDD  IP is performed first, then the registers are reloaded, then   
   the stack frame is deallocated. {This puts one in the position to   
   discard popped cache lines, saving memory BW.}   
      
   When an EXIT is in progress, one can fetch the instructions at RET   
   address and if a near subsequent CALL, that Subroutine's ENTER   
   instruction can be short circuited because the saved registers are   
   on the same place in the stack from return to new entry point !!   
   This not only saves memory bandwidth, it saves cycles, too.   
      
   > Qupls5 is going to use instruction fusing for compare-and-branch   
   > instructions. A compare followed by an unconditional branch will be   
   > treated as one instruction. That gives a 23-bit branch displacement.   
   > Otherwise with a 32-bit instruction, a 12-bit branch displacement is not   
   > really quite enough for modern software. Sure, it works 90+% of the time   
   > but, it adds a headache to assembling and linking programs for when it   
   > does not work.   
      
   My 66000 is yet to run into a subroutine that needs more than its 16-bit   
   word-displacement in its typical BC and BB instructions. {It will happen   
   it just does not happen early in SW development.}   
      
   My 66000 WILL fuse CMP-BC instructions, too.   
      
   > Qupls5 will use constant postfixes which extend the constant by 22-bits   
   > for each postfix used. To get a 64-bit constant three postfixes will be   
   > required. Not quite as clean as universal constants, but simple to   
   > implement in hardware.   
      
   Since REV 2.0 of My 66000, I am looking are the logic needed to derive   
   instruction-length. In Rev 1.0 it took 32 gates and 4-gates of delay,   
   In REV 2.0 (as it stands now) it takes 25 (more regularity), but if   
   STs with large constant-data were discarded, the instruction length   
   decoder drops to 6 total gates and 2-gates of delay--with no Fan-Out   
   or Fan-In greater than 3.   
      
   > Stuck on synthesis for Qupls4 which keeps omitting modules from the   
   > design. I must have checked the module inputs and outputs dozens of   
   > times, and do not know why they are excluded.   
      
   Good luck.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca