... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 131,193 of 131,241
Anton Ertl to John Levine
Re: IA64 and VLIW, Tonights Tradeoff
22 Feb 26 09:16:00
   From: anton@mips.complang.tuwien.ac.at   
      
   John Levine  writes:   
   >But as computer hardware got faster and denser, it became possible to   
   >do the scheduling on the fly in hardware, so you could get comparable   
   >performance with conventional instruction sets in a microprocessor.   
      
   Actually, OoO microprocessors appeared before IA-64 implementations   
   were originally planned to be released, and were implemented in larger   
   processes, i.e., they consumed fewer hardware resources.   
      
   The Pentium Pro was implemented with 5.5M transistors in a 0.35um   
   process (with 8+8KB L1 cache) and a die area of 306mm^2 (probably   
   including the separate L2 cache chip).  Later Intel released the   
   Klamath Pentium II also in 0.35um, but with 16+16KB L1, with 7.5M   
   transistors and a die size of 203mm^2 (the die should be larger than   
   the CPU die of the Pentium Pro, that's why I think that the Pentium   
   Pro number includes the L2 cache die); die size numbers from   
   https://pc.watch.impress.co.jp/docs/2008/1027/kaigai_5.pdf   
      
   The PA-8000 is a 4-wide OoO CPU implemented with 3.8M transistors in a   
   0.5um process in 337.69mm^2.  It has all caches off-chip.   
      
   The Merced Itanium and McKinley Itanium II were 6-wide and implemented   
   in 180nm, the same feature size as the Willamette Pentium 4 and   
   Thunderbird Athlon.  The Merced is reported as having 25.4M   
   transistors (with 16+16KB L1 and 96KB of L2 cache plus 295M   
   transistors for 4MB external L3 cache).  The McKinley is reported as   
   having a die size of 421mm^2 and a transistor count of 221M (with   
   16+16KB L1, 256KB L2 and 3MB L3).   
      
   Looking at   
   ,   
   I read:   
      
   |When Merced was floorplanned for the first time in mid-1996, it turned   
   |out to be far too large [...]. The designers had to reduce the   
   |complexity (and thus performance) of subsystems, including the x86   
   |unit and cutting the L2 cache to 96 KB.[d] Eventually it was agreed   
   |that the size target could only be reached by using the 180 nm process   
   |instead of the intended 250 nm.   
      
   For comparison, in the same 0.18um process the Willamette included   
   8+8KB L1 and 256KB L2 cache in 217mm^2, and in the same-sized process   
   the AMD Thunderbird and Palomino included 64+64KB L1 and 256KB L2   
   cache.   
      
   Unfortunately, the caches dominate the transistor counts, so one   
   cannot tell how many transistors were needed for implementing the data   
   path and cotrol stuff.   
      
   We do have in-order CPUs such as the 4-wide 21164: 9.3M transistors   
   (with 8+8KB L1 and 96KB L2), 299mm^2 in a 0.5um process.   
      
   So the OoOness of the PA-8000 may have cost around as much area as the   
   caches of the 21164 (and the higher clock rate of the 21164 compared   
   to the PA-8000 and the Pentium Pro supported the theory that OoO is   
   inherently slower).   
      
   Comparing the 21164 to the Merced, the L2 cache sizes are the same and   
   the L1 size of Merced is twice that of the 21164, yet the Merced takes   
   2.7x the number of transistors of the 21164, and probably a lot of the   
   additional transistors are not for the additional L1 caches.  It seems   
   that the architectural features and/or maybe the 6-wide implementation   
   of the Merced cost a lot of transistors and thus die area, whereas a   
   sales pitch for EPIC was that thanks to the explicit grouping of   
   instructions, the supposedly quadratic cost of checking for   
   register-to-register dependences would be eliminated, resulting in   
   more area for additional functional units.   
      
   Bottom line: If EPIC is easier to fit on a microprocessor, there is no   
   evidence for that.   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
     Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]