From: anton@mips.complang.tuwien.ac.at   
      
   minforth writes:   
   >Most CPUs have operators for register-based count-down loops   
   >that are blazingly fast.   
      
   Which "operators" do you have in mind, and what do you mean with   
   "blazingly fast".   
      
   Anyway, we have discussed this repeatedly, e.g., in   
   <2022Feb13.231208@mips.complang.tuwien.ac.at> I wrote in reply to your   
   posting , and   
   cited earlier discussions in the topic.   
      
   |"minf...@arcor.de" writes:   
   [...]   
   |>F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in=   
   |> _CX register)   
   |>and you'll happily count down from 5 to 1.   
   |   
   |Yes, but why would one do this? As we have established in an earlier   
   |discussion (see below), the LOOP instruction is typically not faster   
   |than a sequence of simpler instructions:   
   |   
   |<2018Jun6.184616@mips.complang.tuwien.ac.at>:   
   ||minforth@arcor.de writes:   
   ||>FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter.   
   ||>Should do speedy enough. ;-)   
   ||   
   ||Have you measured it? I have   
   ||<2017Mar14.183125@mips.complang.tuwien.ac.at>   
   ||<2017Mar15.141411@mips.complang.tuwien.ac.at> and compared the   
   ||following loops:   
   ||   
   ||.L5: .L5:   
   || subq $1, %rax loop .L5   
   || jne .L5   
   ||   
   ||I found that for these loops Sandy Bridge, Haswell, and Skylake take   
   ||~4 cycles per iteration using LOOP, and 1-2 cycles per iteration when   
   ||using jne.   
   |   
   |<2018Jun7.141731@mips.complang.tuwien.ac.at>:   
   ||cycles for 1000 iterations   
   || K10 Excavator Zen   
   ||Phenom II Athlon X4 845 Ryzen 1600X   
   || 3021 1314 1051 loop   
   || 2020 1484 1051 sub; jne   
   || 2026 1489 1053 add; cmp; jne   
   |   
   |There is no performance advantage on modern AMD and Intel CPUs for the   
   |instruction LOOP over a good implementation of the Forth word LOOP (as   
   |in the third example).   
      
   >If they can be used within Forth-based loop constructs   
   >I would expect a greater speed increase than what you measured.   
      
   You obviously ignore repeated refutations of your claims of superior   
   performance for LOOP-instruction-based counted loops. Maybe you   
   should implement and measure such a counted loop yourself and compare   
   it to the LOOP word on SwiftForth and VFX Forth.   
      
   - anton   
   --   
   M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
    New standard: https://forth-standard.org/   
   EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/   
   EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/   
      
   --- SoupGate-DOS v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|