home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.forth      Forth programmers eat a lot of Bratwurst      117,927 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 116,453 of 117,927   
   Anton Ertl to Anton Ertl   
   Re: Best Euler #13 =?UTF-8?B?c29sdXRpb24   
   03 May 24 12:22:51   
   
   From: anton@mips.complang.tuwien.ac.at   
      
   anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:   
   >Gerry Jackson's answer inspired my solution: sum up each column, the   
   >way you would do it by hand.  This not only avoids the need for   
   >bignums and the uncertainty about the correctness that some of the   
   >other solutions have, it is also the most efficient: In this digitwise   
   >solution, every digit is only added, while in the conversion   
   >solutions, each digit costs at least an add, a subtract, and a   
   >multiply (and more in the case of FP); my solution does the subtracts   
   >for the whole column once instead of 100 times:   
   >   
   >s" euler13.input" slurp-file 2constant input   
   >   
   >: sumcolumn ( ucarry1 c-addr -- ucarry2 )   
   >    5100 bounds ?do   
   >        i c@ +   
   >    51 +loop   
   >    '0' 100 * - 0 # drop ;   
   >   
   >: sum ( -- c-addr2 u2 )   
   >    0 <<#   
   >    input drop 50 1 mem-do   
   >        i sumcolumn   
   >    loop   
   >    0 #s #> ;   
   >   
   >sum drop 10 type cr   
   >   
   >This outputs "5537376230".   
      
   One additional observation inspires an optimization: You can add up to   
   25 digits without overflowing a byte.  So you can do the sum of up to   
   25 sequences of digits in SIMD style with the ordinary +; with 8-byte   
   cells, you can add 8 digits with one +.  With AVX-512, you can add all   
   the digits of one line at once, but unfortunately, Gforth does not   
   support this properly (and my vector package is designed for long   
   vectors and has too much overhead for short vectors).  So I   
   implemented this for working with 8-byte cells;   
      
   here 5114 allot 14 + constant input   
      
   s" euler13.input" r/o open-file throw   
   input 5100 rot read-file throw 5100 <> throw   
      
   create sum25buf 256 allot   
      
   : sum25 ( -- )   
       sum25buf 8 + input 6 - 4 0 do { d s }   
           s 0 25 0 do ( addr w )   
               over @ $ffff000000000000 and + swap 51 + swap loop   
           $3030000000000000 25 * - d ! drop   
           d cell+ s 56 + s cell+ do   
               i 0 25 0 do ( daddr saddr w )   
                   over @ + swap 51 + swap loop   
               nip $3030303030303030 25 * - over !   
               cell+   
           cell +loop   
           drop   
           d 64 + s 25 51 * +   
       loop   
       2drop ;   
      
   : sumcolumn ( ucarry1 c-addr -- ucarry2 )   
       256 bounds ?do   
           i c@ +   
       64 +loop   
       0 # drop ;   
      
   : sum ( -- c-addr2 u2 )   
       sum25   
       0 <<#   
       sum25buf 14 + 50 1 mem-do   
           i sumcolumn   
       loop   
       0 #s #> ;   
      
   sum drop 10 type cr   
      
   This is quite messy, especially dealing with the two extra bytes per   
   line.  Does it at least pay off in speed?  Calling it with   
      
   perf stat ~/gforth/gforth-fast .4th -e "#>> : foo 1000000 0 do sum   
   2drop #>> loop ; foo bye"   
      
   i.e., 1M calls to SUM costs (on a 4GHz Skylake):   
      
   euler13           euler13-faster   
   30_647_182_330     9_653_276_959      cycles   
   75_201_467_616    24_413_536_115      instructions   
      
      7.859516000       2.411344000 seconds user   
      
   So more than a factor of 3.  9653 cycles for a call to SUM (in   
   euler13-faster) is not bad.  A quick measurement indicates that about   
   1400 cycles of that are spent in # and #S.   
      
   - anton   
   --   
   M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
        New standard: https://forth-standard.org/   
      EuroForth 2023: https://euro.theforth.net/2023   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca