home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,781 of 131,241   
   Terje Mathisen to All   
   Re: MC88110 development history   
   28 Sep 25 15:01:31   
   
   From: terje.mathisen@tmsw.no   
      
   Lawrence D’Oliveiro wrote:   
   > On Sat, 27 Sep 2025 21:29:58 GMT, M. Anton Ertl wrote:   
   >    
   >> Data General also switched from 88K to the Pentium Pro.   
   >    
   > The Pentium Pro was the one that gave great 32-bit performance, but   
   > sacrificed 16-bit performance. Because Intel assumed that 16-bit code   
   > would be on the way out by that point.   
   >    
   > The DOS/Windows world said otherwise ...   
   >    
   That is not quite true:   
      
   Yes, the P6/PPro had a couple of snags, the most serious one was still    
   not really important, except it hit a number of very carefully optimized   
   asm inner loops:   
      
   Partial Register Stalls   
      
   When you update a part of a register, like AL/AH/AX and then use a    
   larger part like AX/EAX, the cpu would stall until all previous    
   instructions had retired before the new part could be merged with the    
   older full register.   
      
   In my own Word Count code, which broke whatever records existed on the    
   Pentium, counting characters/word/lines at 40 MB/s on a 60 MHz Pentium,    
   that fully unrolled inner loop would now suffer a PRS stall every 1 or 2   
   normal cycles.   
      
   Sounds really bad, except (a) word count is not a performance critical    
   functions for any known scenario/user experience, and (b) even with the    
   PRS stalls, the 200 MHz PPro still ran at over 20 MB/s.   
      
   Finally, the final OoO-compatible version of my wc code had 5 or 6    
   different kernels and would start every run by benchmarking all of the    
   algorithms on the first 4KB of input, then use the fastest one for the    
   remainder.   
      
   Terje   
      
   --    
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca