... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,221 of 4,675
Terje Mathisen to Andrew Cooper
Re: Speculative data leaks in all supers
06 Jan 18 10:54:47
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Andrew Cooper wrote:   
   > On 05/01/2018 19:17, Terje Mathisen wrote:   
   >> Thank you Andrew!   
   >   
   > No problem at all.  Frankly, it was quite cathartic finally being   
   > able to talk about this in public.  (According to git, my earliest   
   > patch towards fixing this in Xen is Wed, 16 Aug 2017 17:06:59 +0000,   
   > which is now upstream.)   
      
   I feel for you!   
      
   Back in Nov 1994 I happened to write the first public message about   
   FDIV, then over the Dec/Jan period I wrote most of the sw workaround for   
   that bug.   
      
   I had to completely retire from comp.sys.intel early on in that process   
   because everything happened in public.   
      
   Terje   
      
      
   >   
   > The current media frenzy, and the fact that Variant 1 has unfixable   
   > cases, is a very different story. (combined with 0 research into   
   > what kinds of dependent memory reads can in-practice be coerced into   
   > being speculatively-leaky.)   
   >   
   >> I suggested over in comp.arch that the obvious way to isolate   
   >> processes properly would be to _never_ allow any externally-visible   
   >> state to be written back before the instruction either actually   
   >> retires or at least is known to run and not fault.   
   >>   
   >> This would then force all cache controllers, BTB, TLB etc to have a   
   >> few local buffers, one for each possible level of speculation, to   
   >> hold this type of data and then commit it when the instruction   
   >> retire> Mitch Alsup told us that they actually implemented   
   >> something similar to this way back in 1991. :-)   
   >   
   > Modern processors from both vendors already have this to a certain   
   > extent.  E.g. while the Return Stack Buffer/Return Address Stack is   
   > architecturally 32 entries, it is apparently micro-architecturally   
   > larger to deal with the fact that the pipeline can speculate ~200   
   > uops ahead of the retire buffer in well-optimised code.   
   >   
   > (If there is anything I've learnt in practice, it's that the phrase   
   > "It's complicated" doesn't begin to describe things.)   
   >   
   > I'm not entirely convinced it is safe as described, in cases where   
   > you have nested speculation windows ("It's complicated") where an   
   > inner window gets restarted while an outer one is still pending.   
   > OTOH, this is based on a pathological distrust of double-fetch   
   > scenarios, rather than a sensible period of time to consider the   
   > proposal.   
   >   
   > ~Andrew   
   >   
   >>   
   >> Terje   
   >>   
   >> Andrew Cooper wrote:   
   >>> On 03/01/2018 18:38, Rod Pemberton wrote:   
   >>>>   
   >>>> Apparently, Intel processor's for over the past decade are   
   >>>> allowing speculative execution of code without any privilege   
   >>>> checks.  The exact specifics of the flaw are apparently still   
   >>>> secret.   
   >>>   
   >>> The embargo broke 5h ago.  tl;dr everything is broken, although   
   >>> Intel processors do have a failure mode which is worse than the   
   >>> others.   
   >>>   
   >>> All the attack strategies rely on the fact that you can recover   
   >>> the results of calculations during speculative execution via   
   >>> cache timing attacks, combined with the fact that an attacker   
   >>> can deliberately poison branch prediction logic to cause   
   >>> speculation of chosen code.   
   >>>   
   >>> SP1, a.k.a. Bounds-check Bypass:   
   >>>   
   >>> In this case, you are limited to executing basic blocks that you   
   >>> can locate in the victim context.  As an attacker, you control   
   >>> the taken/not-taken prediction state, and can deliberately cause   
   >>> the processor to speculate into the wrong basic block when it   
   >>> encounters a conditional branch.  This can be (ab)used to   
   >>> deliberately cause a speculative read off the end of an array.   
   >>>   
   >>> In Jit-able cases (BPF filters in the kernel, Javascript in a   
   >>> webpage, many other examples), an attacker has some control over   
   >>> the eventual layout of basic blocks in the victim context.   
   >>>   
   >>> This case is the hardest to deal with, because sort of   
   >>> inhibiting speculation before every memory read that has any   
   >>> attacker-controlled component, it can't be fixed.   
   >>>   
   >>> SP2, a.k.a. Branch Target Injection:   
   >>>   
   >>> Indirect jump and call instructions (call/jmp *%reg/mem)   
   >>> typically don't have a single destination during the lifetime of   
   >>> the program, and are predicted using the Branch Target Buffer,   
   >>> which is based on the branch history.  An attacker can poison the   
   >>> BTB and cause speculation to go to an arbitrary destination.   
   >>>   
   >>> Therefore, an attacker which poisons the BTB can cause the   
   >>> victim indirect branch to speculate to an arbitrary location, and   
   >>> is not restricted to the victim basic blocks in their allotted   
   >>> order.  On hardware without the SMEP feature active, speculation   
   >>> can be redirected back into user code, so the attacker can   
   >>> provide a custom basic block to be speculated over - See SP3.   
   >>>   
   >>> ret instructions are also indirect branches, but are predicted   
   >>> (along with call instructions) via the Return Stack Buffer.  An   
   >>> RSB prediction is always followed if valid, so an attacker can   
   >>> poison the RSB and find a victim codepath which executes more ret   
   >>> instructions than call instructions, at which point the attacker   
   >>> takes control of speculation in the same way.  longjmp() and/or   
   >>> context switch into a deeper call tree than the one you are   
   >>> currently in is the most common way of executing more ret   
   >>> instructions than call instructions in otherwise well-formed   
   >>> code.   
   >>>   
   >>> Mitigating this is far harder.  To do it effective and   
   >>> efficiently, you need new compilers which can transform indirect   
   >>> branches into safer alternatives (e.g. the RETPOLINE thunk), and   
   >>> new microcode which implements additional facilities to the   
   >>> kernel.  Despite this, the performance hit is substantial.   
   >>>   
   >>> SP3, a.k.a. Rogue Data Load:   
   >>>   
   >>> This issue is specific to Intel processors (and some ARM   
   >>> processors, but that is OT), and occurs because permission checks   
   >>> for reads of pages which are already present in the TLB are   
   >>> deferred until the instruction is retired.   
   >>>   
   >>> This means that, entirely in userspace, with no   
   >>> modeswitches/traps/system calls/etc, speculative execution can   
   >>> read supervisor mappings and recover the content via cache   
   >>> timing attacks.   
   >>>   
   >>> All mitigations for this revolve around breaking the TLB-hit   
   >>> which is a necessary prerequisite.  For native operating systems,   
   >>> this means isolating the user and kernel execution, and Linux   
   >>> KPTI is the prominent example.  For hardware with virt   
   >>> extentions, moving the workload into a VM also mitigates the   
   >>> issue, as the TLBs tagging prohibits a hit.   
   >>>   
   >>>   
   >>> ~Andrew   
   >>>   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]