... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.arch

Apparently more than just beeps & boops

131,241 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 129,353 of 131,241

George Neuner to Anton Ertl

Re: VAX

08 Aug 25 19:48:59

   From: gneuner2@comcast.net   
      
   On Fri, 08 Aug 2025 06:16:51 GMT, anton@mips.complang.tuwien.ac.at   
   (Anton Ertl) wrote:   
      
   >George Neuner  writes:   
   >   
   >>The decoder converts x86 instructions into traces of equivalent wide   
   >>micro instructions which are directly executable by the core.  The   
   >>traces then are cached separately [there is a $I0 "microcache" below   
   >>$I1] and can be re-executed (e.g., for loops) as long as they remain   
   >>in the microcache.   
   >   
   >No such cache in the P6 or any of its descendents until the Sandy   
   >Bridge (2011).  The Pentium 4 has a microop cache, but eventually   
   >(with Core Duo, Core2 Duo) was replaced with P6 descendents that have   
   >no microop cache.  Actually, the Core 2 Duo has a loop buffer which   
   >might be seen as a tiny microop cache.  Microop caches and loop   
   >buffers still have to contain information about which microops belong   
   >to the same CISC instruction, because otherwise the reorder buffer   
   >could not commit/execute* CISC instructions.   
   >   
   >* OoO microarchitecture terminology calls what the reorder buffer does   
   >  "retire" or "commit".  But this is where the speculative execution   
   >  becomes architecturally visible ("commit"), so from an architectural   
   >  view it is execution.   
   >   
   >Followups set to comp.arch   
   >   
   >- anton   
      
   Thanks for the correction.  I did fair amount of SIMD coding for   
   Pentium II, III and IV, so was more aware of their architecture. After   
   the IV, I moved on to other things so haven't kept up.   
      
   Question:   
   It would seem that, lacking the microop cache the decoder would need   
   to be involved, e.g., for every iteration of a loop, and there would   
   be more pressure on I$1.  Did these prove to be a bottleneck for the   
   models lacking cache?  [either? or something else?]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]