From: cross@spitfire.i.gajendra.net   
      
   In article ,   
   John Reagan wrote:   
   >On 2/24/2025 4:43 PM, Arne Vajhøj wrote:   
   >> On 2/24/2025 4:22 PM, Michael S wrote:   
   >>> On Mon, 24 Feb 2025 15:08:57 -0500   
   >>> Arne Vajhøj wrote:   
   >>>> On 2/24/2025 12:42 PM, Michael S wrote:   
   >>   
   >>>> C++ VMS x86-64 is clang which in the (older) clang version used   
   >>>> should mean C++14 while C++ VMS Itanium is very very old (like   
   >>>> C++ 98 old).   
   >>>>   
   >>>>> According to the benchmarks that you posted here several months (a   
   >>>>> year?) ago, VMS x86-64 compilers are quite awful comparatively to   
   >>>>> x86-64 compilers available on Windows/Linux/BSD.   
   >>>>> Do you want to say that VMS Itanium compilers are worse?   
   >>>>   
   >>>> I believe the conclusion was that the VMS x86-64 compilers except C++   
   >>>> was slower than C/C++ on other OS and C++ on VMS.   
   >>>   
   >>> Somehow I got an impression that C++ compilers were also significantly   
   >>> slower than C++ compilers on other platforms.   
   >>> Do I misremember?   
   >>   
   >> I don't even remember that I posted non-VMS numbers here. Age! :-)   
   >>   
   >> But I just checked VMS C++ latest (CXX/OPT=LEVEL:5 and clang -O3) vs a   
   >> random Windows GCC 14.1 (g++ -O3):   
   >>   
   >> VMS is a little faster for integer   
   >> they are about the same for floating point   
   >> Windows is a lot faster for string   
   >>   
   >> And given that this is a micro-benchmark with in reality just an inner   
   >> loop evaluating a single expression, which means huge uncertainty, then   
   >> I don't see this as proof of a significant difference.   
   >>   
   >> Arne   
   >>   
   >We are aware of the string/char performance issues.   
   >   
   >On Alpha and Itanium, the lowlevel routines inside of LIBOTS for things   
   >like OTS$MOVE, string compare, memmove, etc. are all written in   
   >hand-crafted assembly. For x86, we are still using a set of BLISS   
   >reference code that is simple. Plus the LIBOTS we all have on our   
   >systems was compiled with a non-optimizing BLISS cross-compiler.   
      
   Hmm. It strikes me that LLVM has intrinsics for `memmove` that   
   would also work for OTS$MOVE3; I would think that that would be   
   most efficient, as for small moves, this could lower directly   
   to a couple of loads and/or stores?   
      
   >We are currently playing with native compiled LIBOTS code and doing some   
   >benchmarks. Besides the brain-dead BLISS code, we have versions that   
   >loop with larger chunks of data which are even faster. The fastest   
   >we've seen so far is a native assembly version that uses the REP   
   >instruction prefix on the MOVSB. That version didn't check for   
   >overlapping source/dest however so any real version gets a little   
   >slower. I'm not sure when we can incorporate these, but I'm trying to   
   >push them as soon as possible.   
      
   Yeah, Intel made `REP MOVESB`/`REP STOSB` actually fast a few   
   uarchs ago. Good stuff, though startup overhead still dominates   
   for <128 bytes or something like that, and having to muck with   
   the DF flag remains a bummer.   
      
   >A fun reference to read is   
   >   
   >https://cdrdv2-public.intel.com/814198/248966-Optimization-Refe   
   ence-Manual-V1-050.pdf   
      
   Agner Fog's optimization guides can also be a useful resource   
   for things like this: https://www.agner.org/optimize/   
      
    - Dan C.   
      
   --- SoupGate-DOS v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|