Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 129,781 of 131,241    |
|    Terje Mathisen to All    |
|    Re: MC88110 development history    |
|    28 Sep 25 15:01:31    |
      From: terje.mathisen@tmsw.no              Lawrence D’Oliveiro wrote:       > On Sat, 27 Sep 2025 21:29:58 GMT, M. Anton Ertl wrote:       >        >> Data General also switched from 88K to the Pentium Pro.       >        > The Pentium Pro was the one that gave great 32-bit performance, but       > sacrificed 16-bit performance. Because Intel assumed that 16-bit code       > would be on the way out by that point.       >        > The DOS/Windows world said otherwise ...       >        That is not quite true:              Yes, the P6/PPro had a couple of snags, the most serious one was still        not really important, except it hit a number of very carefully optimized       asm inner loops:              Partial Register Stalls              When you update a part of a register, like AL/AH/AX and then use a        larger part like AX/EAX, the cpu would stall until all previous        instructions had retired before the new part could be merged with the        older full register.              In my own Word Count code, which broke whatever records existed on the        Pentium, counting characters/word/lines at 40 MB/s on a 60 MHz Pentium,        that fully unrolled inner loop would now suffer a PRS stall every 1 or 2       normal cycles.              Sounds really bad, except (a) word count is not a performance critical        functions for any known scenario/user experience, and (b) even with the        PRS stalls, the 200 MHz PPro still ran at over 20 MB/s.              Finally, the final OoO-compatible version of my wc code had 5 or 6        different kernels and would start every run by benchmarking all of the        algorithms on the first 4KB of input, then use the fastest one for the        remainder.              Terje              --        - |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca