home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,506 of 131,241   
   MitchAlsup to All   
   Re: Why I've Dropped In   
   26 Aug 25 21:46:24   
   
   From: user5857@newsgrouper.org.invalid   
      
   BGB  posted:   
      
   > On 7/28/2025 6:18 PM, John Savard wrote:   
   > > On Sat, 14 Jun 2025 17:00:08 +0000, MitchAlsup1 wrote:   
   > >   
   > >> VAX tried too hard in my opinion to close the semantic gap.   
   > >> Any operand could be accessed with any address mode. Now while this   
   > >> makes the puny 16-register file seem larger,   
   > >> what VAX designers forgot, is that each address mode was an instruction   
   > >> in its own right.   
   > >>   
   > >> So, VAX shot at minimum instruction count, and purposely miscounted   
   > >> address modes not equal to %k as free.   
   > >   
   > > Fancy addressing modes certainly aren't _free_. However, they are,   
   > > in my opinion, often cheaper than achieving the same thing with an   
   > > extra instruction.   
   > >   
   > > So it makes sense to add an addressing mode _if_ what that addressing   
   > > mode does is pretty common.   
   > >   
   >   
   > The use of addressing modes drops off pretty sharply though.   
   >   
   > Like, if one could stat it out, one might see a static-use pattern   
   > something like:   
   >    80%: [Rb+disp]   
   >    15%: [Rb+Ri*Sc]   
   >     3%: (Rb)+ / -(Rb)   
   >     1%: [Rb+Ri*Sc+Disp]   
   >    <1%: Everything else   
      
   Since RISC-V only has [Rb+dips12] the other 20% take at least 2 instructions.   
   Simple math indicates this requires 1.2+ instructions/mem-ref instead of 1.0   
   instructions/mem-ref. disp12 does not help either.   
      
   My 66000 does not have (Rb)+ or -(Rb), and most RISC-machines don't either.   
   On the other hand, I see more [Rb+Ri< Though, I am counting [PC+Disp] and [GP+Disp] as part of [Rb+Disp] here.   
   >   
   > Granted, the dominance of [Rb+Disp] does drop off slightly when   
   > considering dynamic instruction use. Part of it is due to the   
   > prolog/epilog sequences.   
      
   I have a lot of [IP,DISP] due to the way the compile places data.   
      
   > If one had instead used (SP)+ and -(SP) addressing for prologs and   
   > epilogs, then one might see around 20% or so going to these instead.   
   > Or, if one had PUSH/POP, to PUSH/POP.   
      
   ENTER and EXIT compress prologues and epilogues to a single instruction   
   each. They also have the option of placing the preserved registers in   
   a place where the called subroutine cannot damage them.   
      
   > The discrepancy then between static and dynamic instruction counts them   
   > being mostly due to things like loops and similar.   
   >   
   > Estimating the effect of loops in a compiler is hard, but had noted that   
   > assuming a scale factor of around 1.5^D for loop nesting levels (D)   
   > seemed to be in the area. Many loops end up unreached in many   
   > iterations, or only running a few times, so possibly counter-intuitively   
   > it is often faster to assume that a loop body will likely only cycle 2   
   > or 3 times rather than 100s or 1000s, and trying to aggressively   
   > optimize loops by assuming large N tends to be detrimental to performance.   
      
   VAX compilers set the loop-count = 10 and did OK for their era. A   
   low count (like 10) ameliorates the small loops (letters in a name)   
   against the larger loops like Matrix300.   
      
   > Well, and at least thus far, profiler-driven optimization isn't really a   
   > thing in my case.   
   >   
   >   
   -----------------------   
   >   
   > One could maybe argue for some LoadOp instructions, but even this is   
   > debatable. If the compiler is designed mostly for Load/Store, and the   
   > ISA has a lot of registers, the relative benefit of LoadOp is reduced.   
   >   
   > LoadOp being mostly a benefit if the value is loaded exactly once, and   
   > there is some other ALU operation or similar that can be fused with it.   
   >   
   > Practically, it limits the usefulness of LoadOp mostly to saving an   
   > instruction for things like:   
   >    z=arr[i]+x;   
   >   
   >   
   > But, the relative incidence of things like this is low enough as to not   
   > save that much.   
   >   
   > The other thing is that one has to implement it in a way that does not   
   > increase pipeline length,   
      
   This is the key point about LD-OPs:: if you build a pipeline to support   
   them, then you will suffer when instruction stream is independent RISC-   
   like instructions--conversely; if you build the pipeline for RISC-like   
   instructions, LD-OPs take a penalty unless you by off on Medium OoO, at   
   least.   
      
   >                           since if one makes the pipeline linger for   
   > sake of LoadOp or OpStore, then this is likely to be a net negative for   
   > performance vs prioritizing Load/Store, unless the pipeline had already   
   > needed to be lengthened for other reasons.   
      
   And thus, this is why RISC-machines largely avoid LD-OPs.   
      
   > One can be like, "But what if the local variables are not in registers?"   
   > but on a machine with 32 or 64 registers, most likely your local   
   > variable is already going to be in a register.   
   >   
   > So, the main potential merit of LoadOp being "doesn't hurt as bad on a   
   > register-starved machine".   
      
   So does poking your eye with a hot knife.   
      
   > > That being said, though, designing a new machine today like the VAX   
   > > would be a huge mistake.   
   > >   
   > > But the VAX, in its day, was very successful. And I don't think that   
   > > this was just a result of riding on the coattails of the huge popularity   
   > > of the PDP-11. It was a good match to the technology *of its time*,   
   > > that being machines that were implemented using microcode.   
   > >   
   >   
   > Yeah.   
   >   
   > There are some living descendants of that family, but pretty much   
   > everything now is Reg/Mem or Load/Store with a greatly reduced set of   
   > addressing modes.   
   >   
   >   
   > > John Savard   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca