... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 4,572 of 4,675
Robert Prins to Robert Prins
Fwd: Fast lookup of 234 possible 1/2/3 c
20 Mar 23 20:58:39
   From: robert@nospicedham.prino.org   
      
   On 2023-03-09 15:28, Robert Prins wrote:   
   > On 2023-03-09 10:19, Terje Mathisen wrote:   
   >> This is much better!   
   >   
   > Thank you!   
   >   
   >> I still think you should pay more attention to the cost of branches, i.e. a   
   >> sequential scan of a shortish segment of the country array is very cheap, on   
   >> the order of 2-3 clock cycles/country, while a single mispredicted branch   
   cost   
   >> 8-20 cycles depending upon the CPU you are using.   
   >   
   > An AMD FX8150 and an Intel i7-4710MQ.   
   >   
   >> Assuming this really is so important, why not "waste" a bit more lookup   
   table   
   >> RAM for a full 2-char index?   
   >   
   > It's Utterly Unimportant (with two huge "U"s), the program as-is runs in 0.32   
   > seconds clock-time, but it's just interesting to see how far I can go in pure   
   > Pascal, to translate that subsequently into in-line assembler, where I'm in   
   many   
   > cases harking back to Turbo Pascal 3 times, by coding post-Pentium   
   instructions   
   > as "db" sequences, the original code dates back to 1994, and pretty   
   amazingly,   
   > many of my original record structures are very AVX friendly!   
   >   
   > Anyway, given that my ['A'..'Z'] array had one position open, I've used that   
   to   
   > replace the general last-matched country with a per-initial character   
   > last-matched country, and that had cut the chop-count even more:   
   >   
   > Original full range                 : 40,362   
   > Low/high per initial character (pic): 14,907   
   > As previous + first most used pic   :  2,871   
   > As previous + cache pic             :  1,747   
   >   
   > And now I'm going to think slightly outside the box, but will have to knock   
   up a   
   > bit of REXX to actually test this, by what might be counter-intuitive, upping   
   > the low and/or high value of some ranges into the previous/next (or,   
   unlikely,   
   > but never say never, pre-previous or over-next) range, so that an initial   
   first   
   > lookup immediately finds the second most-used country, without generating an   
   > excessive number of additional lookups for even less used countries.   
   >   
   > E.g. for countries starting with "L" I currently only have three, in order of   
   > use, "LT", "LV" (74 lookups), and "L" (45 lookups), both requiring three   
   chops   
   > for a total of 357 chops. If I were to extend the "L" range into the "M"   
   range   
   > so that a first chop returns "LV" I save 148 chops, which means that even if   
   I   
   > need up to three more chops for "L", I still save time, and for this specific   
   > case, extending the "Lx" range to "MEX" gives me single-chop hits on "LV" and   
   > still the same three-chop hits on "L", so reducing the number of chops to   
   209.   
   > In this specific case, I could go even further, and take the lo-range back   
   into   
   > the "K" countries and get to "L" in just two chops.   
   >   
   > Doing the same for the also easy to try three additional (after "GB") "G*"   
   > countries saves me another 96 chops if I put "GR", the second most used, in   
   the   
   > initial middle of the range (by extending the high value to "IR"), without   
   > affecting "GE" and "GH"chop counts.   
   >   
   > I'm going to see how this works out before doing anything else, as this   
   change   
   > would be easy to implement in Pascal and in my PL/I version of the same on   
   z/OS,   
   > I keep the two and they should give the same output, if they don't I know   
   that   
   > I've screwed up somewhere!   
      
   And the optimal for the current (and probably near future) set of data is now   
      
   1,173   
      
   I did write some REXX to scan both the input data and data that was passed into   
   the "srch_cnty" routine. I then extended the chop intervals by the   
   length-of-the-interval + 20 on both the left and right (obviously not going   
   below 0 or above 234) and let that run, marking the lowest number of chops and   
   shortest intervals (not less than zero) and that gave me the above number, just   
   over 2.9% of the original.   
      
   I'm going to leave it at that for now. The new intervals will, as I said,   
   unlikely change a lot in the near future, I may encounter a few more   
   nationalities of drivers and visit a few more countries, but in small numbers   
   they won't materially affect the current chopping intervals.   
      
   This is what I got out for the per interval scans:   
      
                 +++    -1-   min   max  1/2   
   // 'A'  15   A      A     AFG   AZ     8 (ARK)   
   // 'B'  18   B      B     AZ    D     32 (BY ) -2-   
   // 'C'  15   CZ  <> CH    C     DJI   42 (CO )   
   // 'D'   6   D      D     CYM   EAK   51 (DK ) -2-   
   // 'E'  12   E      E     DZ    GB    64 (EST) -2-   
   // 'F'   6   F      F     FIN   FSM   70 (FL ) -2-   
   // 'G'  14   GB     GB    G     IR    84 (GR ) -2-   
   // 'H'   5   H   <> HR    H     IR    91 (HR )   
   // 'I'   7   I   <> IL    IL    J     96 (IRL) -2-   
   // 'J'   2   J      J     JA    JA   100 (JA )   
   // 'K'  10   KZ     KZ    IRQ   KZ   103 (KGZ)   
   // 'L'   7   LT  <> L     KS    MNE  117 (LV ) -2-   
   // 'M'  15   MA     MAL   M     NA   126 (MK )   
   // 'N'  11   NL     NL    LS    PNG  133 (N  ) -2-   
   // 'O'   1   OM     OM               144 (OM )   
   // 'P'  11   PL  <> PK    NZ    PY   149 (PK )   
   // 'Q'   1   Q      Q                156 (Q  )   
   // 'R'  24   RO  <> RL    RA    SGP  171 (RMM)   
   // 'S'  19   S      S     RMM   T    185 (SF ) -2-   
   // 'T'  12   TR     TR    T     USA  204 (TJ )   
   // 'U'   5   UA  <> USA   UA    V    214 (USA) -2-   
   // 'V'   3   VN     VN    V     VU   218 (VN )   
   // 'W'   9   WAN    WAN   UZ    WV   222 (WAN)   
   // 'X'   
   // 'Y'   3   YU     YU    YAR   YV   230 (YU )   
   // 'Z'   3   ZA     ZA    Z     ZW   233 (ZA )   
      
   In other words, if it's not the last scanned value (-1-), I next try the most   
   frequent (+++) value, and after that I scan the sometimes longer interval that   
   is created to (mostly) find the second most occurring value on the first chop,   
   and in some cases even the third most on the second chop, manually added the   
   I-know-these as "-2-" above, and obviously a last found value is saved in the   
   "-1-" position.   
      
   And yes, now that I have the code, it wouldn't be too hard to adapt it to   
   actually only include the combination of 95 nationalities of drivers and   
   countries I have visited, and reduce the chop-count even more. Obviously it is   
   highly unlikely to ever beat a look-up table, but the advantage of this is that   
   it's easily portable between x86 (little endian) and z/OS (big endian).   
      
   Robert   
   --   
   Robert AH Prins   
   robert(a)prino(d)org   
   The hitchhiking grandfather - https://prino.neocities.org/   
   Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]