... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.forth
Forth programmers eat a lot of Bratwurst
117,927 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 117,343 of 117,927
Anton Ertl to minforth
LOOP (was: OOS approach revisited)
28 Jun 25 10:23:51
   From: anton@mips.complang.tuwien.ac.at   
      
   minforth  writes:   
   >Most CPUs have operators for register-based count-down loops   
   >that are blazingly fast.   
      
   Which "operators" do you have in mind, and what do you mean with   
   "blazingly fast".   
      
   Anyway, we have discussed this repeatedly, e.g., in   
   <2022Feb13.231208@mips.complang.tuwien.ac.at> I wrote in reply to your   
   posting , and   
   cited earlier discussions in the topic.   
      
   |"minf...@arcor.de"  writes:   
   [...]   
   |>F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in=   
   |> _CX register)   
   |>and you'll happily count down from 5 to 1.   
   |   
   |Yes, but why would one do this?  As we have established in an earlier   
   |discussion (see below), the LOOP instruction is typically not faster   
   |than a sequence of simpler instructions:   
   |   
   |<2018Jun6.184616@mips.complang.tuwien.ac.at>:   
   ||minforth@arcor.de writes:   
   ||>FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter.   
   ||>Should do speedy enough.  ;-)   
   ||   
   ||Have you measured it?  I have   
   ||<2017Mar14.183125@mips.complang.tuwien.ac.at>   
   ||<2017Mar15.141411@mips.complang.tuwien.ac.at> and compared the   
   ||following loops:   
   ||   
   ||.L5:                            .L5:   
   ||	subq	$1, %rax            loop    .L5   
   ||	jne	.L5   
   ||   
   ||I found that for these loops Sandy Bridge, Haswell, and Skylake take   
   ||~4 cycles per iteration using LOOP, and 1-2 cycles per iteration when   
   ||using jne.   
   |   
   |<2018Jun7.141731@mips.complang.tuwien.ac.at>:   
   ||cycles for 1000 iterations   
   ||  K10        Excavator       Zen   
   ||Phenom II  Athlon X4 845  Ryzen 1600X   
   ||  3021        1314            1051     loop   
   ||  2020        1484            1051     sub; jne   
   ||  2026        1489            1053     add; cmp; jne   
   |   
   |There is no performance advantage on modern AMD and Intel CPUs for the   
   |instruction LOOP over a good implementation of the Forth word LOOP (as   
   |in the third example).   
      
   >If they can be used within Forth-based loop constructs   
   >I would expect a greater speed increase than what you measured.   
      
   You obviously ignore repeated refutations of your claims of superior   
   performance for LOOP-instruction-based counted loops.  Maybe you   
   should implement and measure such a counted loop yourself and compare   
   it to the LOOP word on SwiftForth and VFX Forth.   
      
   - anton   
   --   
   M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
        New standard: https://forth-standard.org/   
   EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/   
   EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/   
      
   --- SoupGate-DOS v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]