Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 4,572 of 4,675    |
|    Robert Prins to Robert Prins    |
|    Fwd: Fast lookup of 234 possible 1/2/3 c    |
|    20 Mar 23 20:58:39    |
      From: robert@nospicedham.prino.org              On 2023-03-09 15:28, Robert Prins wrote:       > On 2023-03-09 10:19, Terje Mathisen wrote:       >> This is much better!       >       > Thank you!       >       >> I still think you should pay more attention to the cost of branches, i.e. a       >> sequential scan of a shortish segment of the country array is very cheap, on       >> the order of 2-3 clock cycles/country, while a single mispredicted branch       cost       >> 8-20 cycles depending upon the CPU you are using.       >       > An AMD FX8150 and an Intel i7-4710MQ.       >       >> Assuming this really is so important, why not "waste" a bit more lookup       table       >> RAM for a full 2-char index?       >       > It's Utterly Unimportant (with two huge "U"s), the program as-is runs in 0.32       > seconds clock-time, but it's just interesting to see how far I can go in pure       > Pascal, to translate that subsequently into in-line assembler, where I'm in       many       > cases harking back to Turbo Pascal 3 times, by coding post-Pentium       instructions       > as "db" sequences, the original code dates back to 1994, and pretty       amazingly,       > many of my original record structures are very AVX friendly!       >       > Anyway, given that my ['A'..'Z'] array had one position open, I've used that       to       > replace the general last-matched country with a per-initial character       > last-matched country, and that had cut the chop-count even more:       >       > Original full range : 40,362       > Low/high per initial character (pic): 14,907       > As previous + first most used pic : 2,871       > As previous + cache pic : 1,747       >       > And now I'm going to think slightly outside the box, but will have to knock       up a       > bit of REXX to actually test this, by what might be counter-intuitive, upping       > the low and/or high value of some ranges into the previous/next (or,       unlikely,       > but never say never, pre-previous or over-next) range, so that an initial       first       > lookup immediately finds the second most-used country, without generating an       > excessive number of additional lookups for even less used countries.       >       > E.g. for countries starting with "L" I currently only have three, in order of       > use, "LT", "LV" (74 lookups), and "L" (45 lookups), both requiring three       chops       > for a total of 357 chops. If I were to extend the "L" range into the "M"       range       > so that a first chop returns "LV" I save 148 chops, which means that even if       I       > need up to three more chops for "L", I still save time, and for this specific       > case, extending the "Lx" range to "MEX" gives me single-chop hits on "LV" and       > still the same three-chop hits on "L", so reducing the number of chops to       209.       > In this specific case, I could go even further, and take the lo-range back       into       > the "K" countries and get to "L" in just two chops.       >       > Doing the same for the also easy to try three additional (after "GB") "G*"       > countries saves me another 96 chops if I put "GR", the second most used, in       the       > initial middle of the range (by extending the high value to "IR"), without       > affecting "GE" and "GH"chop counts.       >       > I'm going to see how this works out before doing anything else, as this       change       > would be easy to implement in Pascal and in my PL/I version of the same on       z/OS,       > I keep the two and they should give the same output, if they don't I know       that       > I've screwed up somewhere!              And the optimal for the current (and probably near future) set of data is now              1,173              I did write some REXX to scan both the input data and data that was passed into       the "srch_cnty" routine. I then extended the chop intervals by the       length-of-the-interval + 20 on both the left and right (obviously not going       below 0 or above 234) and let that run, marking the lowest number of chops and       shortest intervals (not less than zero) and that gave me the above number, just       over 2.9% of the original.              I'm going to leave it at that for now. The new intervals will, as I said,       unlikely change a lot in the near future, I may encounter a few more       nationalities of drivers and visit a few more countries, but in small numbers       they won't materially affect the current chopping intervals.              This is what I got out for the per interval scans:               +++ -1- min max 1/2       // 'A' 15 A A AFG AZ 8 (ARK)       // 'B' 18 B B AZ D 32 (BY ) -2-       // 'C' 15 CZ <> CH C DJI 42 (CO )       // 'D' 6 D D CYM EAK 51 (DK ) -2-       // 'E' 12 E E DZ GB 64 (EST) -2-       // 'F' 6 F F FIN FSM 70 (FL ) -2-       // 'G' 14 GB GB G IR 84 (GR ) -2-       // 'H' 5 H <> HR H IR 91 (HR )       // 'I' 7 I <> IL IL J 96 (IRL) -2-       // 'J' 2 J J JA JA 100 (JA )       // 'K' 10 KZ KZ IRQ KZ 103 (KGZ)       // 'L' 7 LT <> L KS MNE 117 (LV ) -2-       // 'M' 15 MA MAL M NA 126 (MK )       // 'N' 11 NL NL LS PNG 133 (N ) -2-       // 'O' 1 OM OM 144 (OM )       // 'P' 11 PL <> PK NZ PY 149 (PK )       // 'Q' 1 Q Q 156 (Q )       // 'R' 24 RO <> RL RA SGP 171 (RMM)       // 'S' 19 S S RMM T 185 (SF ) -2-       // 'T' 12 TR TR T USA 204 (TJ )       // 'U' 5 UA <> USA UA V 214 (USA) -2-       // 'V' 3 VN VN V VU 218 (VN )       // 'W' 9 WAN WAN UZ WV 222 (WAN)       // 'X'       // 'Y' 3 YU YU YAR YV 230 (YU )       // 'Z' 3 ZA ZA Z ZW 233 (ZA )              In other words, if it's not the last scanned value (-1-), I next try the most       frequent (+++) value, and after that I scan the sometimes longer interval that       is created to (mostly) find the second most occurring value on the first chop,       and in some cases even the third most on the second chop, manually added the       I-know-these as "-2-" above, and obviously a last found value is saved in the       "-1-" position.              And yes, now that I have the code, it wouldn't be too hard to adapt it to       actually only include the combination of 95 nationalities of drivers and       countries I have visited, and reduce the chop-count even more. Obviously it is       highly unlikely to ever beat a look-up table, but the advantage of this is that       it's easily portable between x86 (little endian) and z/OS (big endian).              Robert       --       Robert AH Prins       robert(a)prino(d)org       The hitchhiking grandfather - https://prino.neocities.org/       Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca