... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
alt.os.development
Operating system development chatter
4,255 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,585 of 4,255
antispam@math.uni.wroc.pl to muta...@gmail.com
Re: ecosystem
13 Dec 22 23:25:16
   muta...@gmail.com  wrote:   
   > On Thursday, December 1, 2022 at 12:46:25 PM UTC+8, anti...@math.uni.wroc.pl   
   wrote:   
   >   
   > > In compilers hard part is optimization. When I compare gcc-4.8   
   > > to gcc-12.0 it seems that code produced by gcc-12.0 is probaby   
   > > about 10% more efficient than code from gcc-4.8. But C compiler   
   > > in gcc-12.0 is twice as large as C compiler in gcc-4.8. And   
   > > looking back, gcc-4.8 is much bigger than gcc-1.42 (IIRC C   
   > > compiler in gcc-1.42 was of order one megabyte in size).   
   > > gcc-12.0 produces more efficient code than gcc-1.42, but   
   > > probably no more than 2 times more efficient. Certainly,   
   > > code from gcc-12.0 is not 26 time more efficient than code   
   > > from gcc-1.42 (which would be the case if speed of object   
   > > code were simply proportional to compiler size). And in   
   > > turn gcc-1.42 generates more efficient code than simpler   
   > > compilers.   
   >   
   > Someone said you can get 80% of the performance of a   
   > modern compiler with a handful of "easy" optimizations,   
   > and gave these as references "the dragon book" and   
   > Frances Allen's "Seven Optimising Transformations".   
   >   
   > Any comment?   
      
   Well, it depends on your code.  Compilers normally   
   will perform access to non local variables as written   
   in program.  More precisely, it is easy to write program   
   in a way that there is no redundant non local memory accesses,   
   and it is hard to detect and eliminate redundant ones.   
   If memory access pattern is bad enough, then runtime will   
   be dominated by memory access and speeding other parts   
   has little effect.  But there are also small benchmarks   
   and well-behaved programs which correlate to benchmarks.   
   On small benchmark Tiny C generated code which was about   
   6 times slower than code from gcc.  But Tiny C compiled   
   by gcc run about two times faster than self compiled Tiny C.   
   So was object code from Tiny C 6 times slower than object   
   code from gcc or was it 2 times slower?   
      
   If on well behaved programs you can get about half of speed   
   of optimal code, than on badly behaved ones you probably   
   will get 80%.   
      
   A lot also depends on programming style.  Compare   
      
      for(i = 0; i < N; i++) {   
          a[i] += b[i];   
      }   
      
   with   
      
      int * ap = a;   
      int * bp = b;   
      int * ep = a + N;   
      while(ap < ep) {   
          *ap++ += *bp++;   
      }   
      
   In both cases a and b are arrays of integers (but similar effect   
   would appear for other types).  Good optimizing compiler will   
   generate similar or maybe the same code from both versions.   
   But naive compiler is expected to get faster code from second   
   version.   
      
   The above may look as small thing, but topic is much bigger.   
   Namely, modern tendency is to write code that at first glance   
   may look quite inefficient, but which compiler can transform   
   into much faster, frequently close to optimal code.  You may   
   ask why to write "slow" code and depend on compiler to "fix"   
   it.  The reason is that this "slow" code is easier to write   
   and to understand.  For example you use small functions or   
   maybe macros.  Code using small helper functions may be   
   shorter and easy to get correct.  But in naive compiler   
   function calls cost.  Expanding inline (say using macros)   
   alone is of limited help: function in general must do   
   more work than in special cases.  And unsing macros risks   
   bigger object code (which may also make program slower).   
   Optimizing compiler effectively after expanding function   
   inline is producing special case version for given call.   
   In particular, if one of argunents is constant, then   
   compiler my find sustantial simplifications.   
      
   There is also different trend: autovectorization.  Modern   
   PC-s have "vector" instructions which take arguments from   
   vestor registers, which may be 16,32 of 64 byte long and   
   treat them as arrays of numbers, say 4, 8 or 16 integers.   
   Vector operations performs the same operation (say addition   
   or multiplications) on corresponding integers in both   
   vectors.  In effect, program may do computations many times   
   faster than using normal operations.  Currently the most   
   extreme case would be parallel operation on bytes, which   
   can give 64 times speedup.  Compilers like gcc have   
   now extentions which allow programmer to say that some   
   operations should be done using vector operations.  But   
   it would be nicer if compiler could automatically use   
   vector operations when if gives faster code.  This is   
   called autovectorization.  In some cases it works   
   nicely and gives expected speedup.  In some other cases   
   it does not work.  Still, it is better to have support   
   for such thing even if it not always give speedup.   
      
   --   
                                 Waldek Hebisch   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]