From: robert@nospicedham.prino.org   
      
   On 2018-09-15 18:09, Robert Redelmeier wrote:   
   > Robert Prins wrote in part:   
   >> When it's compiled with an {$undef xmm}, i.e. using the   
   >> MMX registers it runs flawlessly. However, when compiled   
   >> with an {$define xmm} is falls over a bit later with a   
   >> zero-divide error, and that is, so my debugging code that   
   >> uses the XSAVE instruction to dump the CPU state shows,   
   >> caused by the intervening call to the "System._MemNew"   
   >> routine. However, and that's what I have been unable to   
   >> figure out, it only happens on two occasions. Every other   
   >> call to "System._MemNew" leaves XMM0 unchanged.   
   >>   
   >> And yes, I'm moving 16 bytes, but the final vmovdqu in the   
   >> code above is followed by another vmovdqu that fills in 8   
   >> of those "overwritten" bytes.   
   >>   
   >> So the questions are,   
   >>   
   >> 1) how do I figure out where XMM0 is clobbered up, and 2)   
   >> how can it be that it's only clobbered up in two (out of   
   >> several 100) cases   
   >   
   > Repeatably? That sounds like an algorithmic error.   
      
   No, given that the code   
      
   1) using MMX registers give the same results as   
   2) the "pure" Pascal code, and that in turn gives the same results as   
   3) the code written in PL/I running on z/OS.   
      
   I don't think there's anything wrong with the algorithm. ;)   
      
   > Random would be something like task switch clobber.   
      
   Currently in Vilnius, so I don't have access to my AMD system, which is in   
   Oostende, but I will track down the input that leads to the case(s) where the   
   XMM registers are clobbered.   
      
   >> System is W7 Pro-64, and it happens on two systems, an AMD   
   >> FX8150 and an Intel 4710MQ, which would exclude, almost   
   >> certainly, a hardware problem, and mentioning hardware,   
   >> I don't think there is a way to actually trap access to   
   >> registers?   
   >   
   > Yes, By setting breakpoints and the debugger running   
   > essentially single stepped (slug-slow).   
      
   Been there, but the huge problem is that Virtual Pascal doesn't natively   
   support   
   anything more up-to-date than the Pentium, and single stepping "db" coded   
   post-Pentium instructions is a hit-and-miss affair, single-stepping through   
   many   
   will cause the program to just run on. And yes, I can SST through the VP RTL,   
   but once that goes off into Windows itself, I'm (probably?) pretty much up the   
   creek without a paddle. There's also the "minor" inconvenience that the IDE   
   doesn't have any way to monitor the XMM (or YMM) registers in real-time, the   
   raw   
   view of the FPU window at least allows me to monitor the MMX registers.   
      
   Using the trap flag would probably allow me to do this automatically, but I'm   
   afraid that that's way beyond my assembler skills, and I haven't got a clue is   
   that would work in OS code.   
      
   > I have some confidence that even MS w7-64 preserves XMM   
   > registers across syscalls & task swaps. However, your hot   
   > silicon also has YMM and may need a VEX prefix to correctly   
   > do XMM (overflow wrap).   
      
   W7(-64) supports AVX since SP1, so that's not the problem, and all my code is   
   either using legacy MMX registers or VEX encoded instructions using the XMM/YMM   
   registers.   
      
   What is "overflow wrap", and how would I detect it, and would this not also   
   affect MMX instructions?   
      
   Robert   
   --   
   Robert AH Prins   
   robert(a)prino(d)org   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|