From: anton@mips.complang.tuwien.ac.at   
      
   scott@slp53.sl.home (Scott Lurndal) writes:   
   >anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:   
   >>scott@slp53.sl.home (Scott Lurndal) writes:   
   >>>Thomas Koenig writes:   
   >>>>Anton Ertl schrieb:   
   >>>>> Thomas Koenig writes:   
   >>>>>>I recently heard that CS graduates from ETH Zürich had heard about   
   >>>>>>pipelines, but thought it was fetch-decode-execute.   
   >>>>>   
   >>>>> Why would a CS graduate need to know about pipelines?   
   >>>   
   >>>So they can properly simluate a pipelined processor?   
   >>   
   >>Sure, if a CS graduate works in an application area, they need to   
   >>learn about that application area, whatever it is.   
   >   
   >It's useful for code optimization, as well.   
      
   In what way?   
      
   >In general,   
   >any programmer should have a solid understanding of the   
   >underlying hardware - generically, and specifically   
   >for the hardware being programmed.   
      
   Certainly. But do they need to know between a a Wallace multiplier   
   and a Dadda multiplier? If not, what is it about pipelined processors   
   that would require CS graduates to know about them?   
      
   >>Processor pipelines are not the basics of what a CS graduate is doing.   
   >>They are an implementation detail in computer engineering.   
   >   
   >Which affect the performance of the software created by the   
   >software engineer (CS graduate).   
      
   By a constant factor; and the software creator does not need to know   
   that the CPU that executes instructions at 2 CPI (486) instead of at   
   10 CPI (VAX-11/780) is pipelined; and these days both the 486 and the   
   VAX are irrelevant to software creators.   
      
   >>A few more examples where compilers are not as good as even I expected:   
   >>   
   >>Just today, I compiled   
   >>   
   >>u4 = u1/10;   
   >>u3 = u1%10;   
   >>   
   >>(plus some surrounding code) with gcc-14 in three contexts. Here's   
   >>the code for two of them (the third one is similar to the second one):   
   >>   
   >>movabs $0xcccccccccccccccd,%rax movabs $0xcccccccccccccccd,%rsi   
   >>sub $0x8,%r13 mov %r8,%rax   
   >>mul %r8 mov %r8,%rcx   
   >>mov %rdx,%rax mul %rsi   
   >>shr $0x3,%rax shr $0x3,%rdx   
   >>lea (%rax,%rax,4),%rdx lea (%rdx,%rdx,4),%rax   
   >>add %rdx,%rdx add %rax,%rax   
   >>sub %rdx,%r8 sub %rax,%r8   
   >>mov %r8,0x8(%r13) mov %rcx,%rax   
   >>mov %rax,%r8 mul %rsi   
   >> shr $0x3,%rdx   
   >> mov %rdx,%r9   
   >>   
   >>The major difference is that in the left context, u3 is stored into   
   >>memory (at 0x8(%r13)), while in the right context, it stays in a   
   >>register. In the left context, gcc managed to base its computation of   
   >>u1%10 on the result of u1/10; in the right context, gcc first computes   
   >>u1%10 (computing u1/10 as part of that), and then computes u1/10   
   >>again.   
   >   
   >Sort of emphasizes that programmers need to understand the   
   >underlying hardware.   
      
   I am the programmer of the code shown above. In what way would better   
   knowledge of the hardware made me aware that gcc would produce   
   suboptimal code in some cases?   
      
   >What were u1, u3 and u4 declared as?   
      
   unsigned long (on that platform).   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
    Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|