Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 4,361 of 4,675    |
|    Terje Mathisen to Branimir Maksimovic    |
|    Re: What's purpose of "gather" instructi    |
|    27 May 21 16:23:59    |
      From: terje.mathisen@nospicedham.tmsw.no              Branimir Maksimovic wrote:       > I tried with them recenlty and they are slow, slow,       > slower then manualy loading ;)       > I mean like "loop" instruction, uselless ;)       >       Gather is supposed to run at minimum one word per cycle, but preferably       all loads that come from the same cache line should happen in a single       cycle, so that looking up stuff in a compact structure should be       reasonably fast, and much faster than scalar loads.              The first Larrabee CPU had gather implemented in an external chip, so it       was effectively a coprocessor. The idea was that you would setup a bunch       of these as part of a big processing loop, then stream the results through.              I.e. typical GPU optimizing for bandwidth, not latency.              Terje              --       - |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca