... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,696 of 131,241
BGB to Scott Lurndal
Re: Saving and restoring FP state (1/2)
17 Sep 25 13:05:58
   From: cr88192@gmail.com   
      
   On 9/17/2025 8:57 AM, Scott Lurndal wrote:   
   > BGB  writes:   
   >> On 9/16/2025 12:50 PM, Scott Lurndal wrote:   
   >>> BGB  writes:   
   >>>> On 9/14/2025 9:06 AM, Michael S wrote:   
   >>>   
   >>>>   
   >>>> Also there was another related bug where FPU instructions in interrupt   
   >>>> handlers could effect the FPU flags visible in userland.   
   >>>   
   >>> Why on earth would you use floating point instructions   
   >>> in an interrupt handler?   
   >>>   
   >>   
   >>   
   >> I didn't go and track down which code was using FPU instructions, but   
   >> seemingly something was, in any case. I didn't see any particular reason   
   >> to forbid using the FPU inside of interrupt handlers (they are mostly   
   >> still plain C, differing mostly in that there are limited to the   
   >> operating in terms of the physical memory map).   
   >   
   > The standard reasoning for prohibiting floating point in the   
   > kernel is to improve system call overhead by not saving floating   
   > point registers until and unless there is a context switch (and   
   > even then, x86 has features that allow the OS to forgo saving   
   > the floating point registers if they weren't used in the last   
   > scheduling quantum).   
   >   
      
   In my case, this wasn't x86, and on my ISA the FPU stuff is done in   
   GPRs, which typically need to be saved/restored either way. Well, except   
   when running RISC-V code, which effectively splits the register space in   
   half (32+32 rather than 64).   
      
      
   The issue was that the FPSR is (now) aliased to SP(63:48), but there was   
   only a single SP; and the CPU core handles interrupts by causing SP and   
   SSP to switch places in decode.   
      
   The likely more proper solution would have been to have another FPSR   
   aliased to SSP(63:48) which also re-routes; where as-is SSP is currently   
   only a 48 bit register internally.   
      
   But, for now, easier was to disable the updates if inside an ISR.   
      
   This issue wouldn't have existed if still using GBR/GP for this, but GBR   
   has the disadvantage that it gets stomped whenever a reload occurs; so   
   it was either tweak the GBR reload mechanism to not stomp FPSR, or move   
   FPSR somewhere where it doesn't get stomped (the high bits of SP being   
   the most obvious choice).   
      
   Ran into a problem as some of my interrupt handling code does a sanity   
   check to verify that SP was intact between interrupt entry and return,   
   and some of this handling saw that SP changed unexpectedly and triggered   
   a break-point (otherwise, it might have gone unnoticed).   
      
   Though, this may be a payoff from being "needlessly pedantic" in this   
   case. There was a previous check (now disabled) where it would have also   
   XOR'ed all the callee save registers together and then checked the known   
   state against the XOR (if a register having changed, it likely having   
   changed the XOR). Disabled as XOR'ing all them together has a high overhead.   
      
      
   Note that the logic does account for things like context switches, and   
   the SYSCALL interrupt is using a different prolog/epilog sequence which   
   is more optimized for context switching (but is only valid once a task   
   state is configured).   
      
      
   >   
   >> But, in any case, using FP instructions in an interrupt handler   
   >> shouldn't leave state changes that are visible in userland.   
   >   
   > A well understood problem handled by all off the shelf operating   
   > systems.   
      
   Possible.   
      
   In this case, the leaked state was caught by some code being pedantic   
   and noticing the HOB's of SP changing unexpectedly.   
      
      
      
   Otherwise, instruction predication in XG3 now seems to mostly work (was   
   mostly issues in BGBCC). Last issue was some paths where it was trying   
   to use RISC-V encodings in cases where predication was being used, and   
   the RISC-V ops not supporting predication.   
      
   It wouldn't have been as simple as simply converting the RISC-V ops to   
   XG3, as things are not always 1:1. In one of the places it came up, it   
   is a tangled mess, as both RISC-V and XG3 instruction-generation are   
   sorta tangled together.   
      
   Had to detect predication was being used (for the current instructions),   
   and effectively disable the use of RISC-V encodings in this case.   
      
   Though something still isn't perfect, as the Doom demos desync in a   
   different way, where a change in demo desync is usually evidence of a   
   difference in program behavior (though can also be caused by memory   
   corruption, etc, *).   
      
   ...   
      
      
   *: Though, in some cases, it is sensitive to memory contents for   
   out-of-bounds memory accesses, which tended to differ between Doom   
   versions. Some of the Doom source ports try to deal with this by   
   fingerprinting the IWAD and then simulating the contents of the   
   out-of-bounds memory and similar (along with various changes in game   
   behavior) for each engine version. My port doesn't really bother (so, I   
   live with the demo desync, but can still notice changes in demo desync).   
      
      
   Though, almost some possible debate as to whether to bring back in some   
   of the 2RI ops into XG3 (the 2RI-Imm10 space had been effectively from   
   XG3, or not carried over from XG1/XG2). When using predication, a few of   
   these become relevant again. In this mode, had been using the option of   
   simply using some of the 3R ops but directing the output to R0/ZR as a   
   special case to encode an intention to update the T bit, but this has   
   some drawbacks (such as the lack of a decent sized immediate field).   
      
   Though, in this case, the use of predication (and thus conditional   
   compare) is low enough to leave it as debatable as to whether or not it   
   would be a good idea to bring back these encodings (or just continue to   
   live with some more limited Imm6s encodings, and the occasional use of   
   jumbo-prefixes when the Imm6 fails).   
      
   Actually, predication is itself debatable as it does effectively use   
   half the encoding space, and is technically a minority of the   
   instructions. But, does help with performance in some cases (namely, a   
   lot of the cases where XG3 was meant to address).   
      
      
   Could almost reuse the encoding space for a different set of 16-bit ops,   
   but don't necessarily want yet another 16-bit decoder (and, one can   
   argue, if code density matters enough to want to use 16-bit ops, jumping   
   over to RV64GC mode and using RV-C ops may make more sense).   
      
   Though, presently BGBCC doesn't support mixed RV-C and XG3 binaries, and   
   this would kinda be a mess (though, the original ARM+Thumb scheme exists   
   an example of the basic idea here).   
      
   So, for now, they will likely remain as predicated ops and similar.   
      
   ...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]