... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,525 of 4,675
Robert Prins to Terje Mathisen
Re: Speeding up code - am I missing some
17 Aug 18 19:20:54
   XPost: comp.lang.pl1, comp.lang.pascal.borland, comp.lang.pascal.misc   
   From: robert@nospicedham.prino.org   
      
   On 2018-08-17 06:49, Terje Mathisen wrote:   
   > Robert Prins wrote:   
   >> On 2018-08-16 16:02, Terje Mathisen wrote:   
   >>> (newsgroups limited to just clax86, my server does not allow   
   >>> indiscriminate cross-posting.)   
   >>>   
   >>> Robert, you do realize that anything you are going to print will take   
   >>> billions of cycles for every second the printer needs to actually print   
   >>> the pages?   
   >>   
   >> Really? ROFL...   
   >>   
   >> Of course I do realise that once I/O is added, all bets are off. But as   
   >> I've mentioned in another reply you may already have seen, sometimes   
   >> others approach a problem from a completely different angle, with stunning   
   >> results.   
   >   
   > OK, here's the most obvious (?) one:   
   >   
   > Create a function for each of these lists, put them all on an array and call   
   > them in order with the current item:   
   >   
   > Each time a list has been completed, the corresponding function pointer   
   > removes itself from the array of functions to be called.   
      
   Great minds (OK, one very great mind and my rather smaller one) seem to think   
   the same, I'm actually using a similar approach in the follow-up program that   
   converts the output of "LIFT" to RTF format:   
      
   @02:   
      mov   eax, _s   
      lea   eax, [eax * 4 + offset @03]   
      jmp   dword ptr [eax]   
      
      align 4   
      
   @03:   
      dd    offset @04  //  0 - Trip                    ->  0   s ->  1   
      dd    offset @05  //  1 - Trip fall through       ->  2   
      dd    offset @06  //  2 - Type                    ->  2   s ->  3   
      dd    offset @07  //  3 - Type fall through       ->  4   
      dd    offset @08  //  4 - Cnty                    ->  4   s ->  5   
      dd    offset @09  //  5 - Cnty fall through       ->  6   
      dd    offset @10  //  6 - Nat                     ->  6   s ->  7   
      dd    offset @11  //  7 - Nat  fall through       ->  8   
      dd    offset @14  //  8 - Naco / Cnty #I/#E       ->  8   s ->  9   
      dd    offset @18  //  9 - Cnty #I/#E fall through -> 10   
      dd    offset @20  // 10 - Year                    -> 10   s -> 11   
      dd    offset @21  // 11 - Year fall through       -> 12   
      dd    offset @22  // 12 - V! tables               -> 12   s -> 13   
      nop   
      nop   
      nop   
      nop   
      
   @04:   
      
   Sadly, this approach is not (entirely) possible here, because the Country   
   top-10   
   tables are "incomplete", I've hitched through Bulgaria, Slovakia, and Andorra,   
   but I've never hitched in those countries.   
      
   Obviously I could cater for this, but while we've got this discussion going,   
   I've decided to try filling the entire top-N in the "caching" code, where I end   
   up going to nearly the bottom, entry 4,300 of (currently) 4,310 rides, anyway.   
   A   
   quick test using the old PL/I version, where I can use a compiler option that   
   tells me how many times each statement is executed, tells me that the executed   
   statement count is reduced by more than 90% when I use this approach for the   
   Top-10s of trips. ;)   
      
   Only unpleasant side issue is the fact that I'm using 3-D arrays,   
      
   dcl 1 t10_trip(trip_total.trip) ctl,   
          2 tr_ix(3)        fixed bin (31) init ((3)0),   
          2 tr_ktv(3, 10)   ptr;   
      
   with   
      
   #j = t10_trip(lift_list.trip).tr_ix(1) + 1;   
   if #j <= hbound(tr_ktv, 3) then   
      t10_trip(lift_list.trip).tr_ktv(1, #j) = lift_ptr;   
      
   t10_trip(lift_list.trip).tr_ix(1) = #j;   
      
   to fill the top-10 slots, and worse for the Type/Country/Nat tables where I   
   have   
   to do a lookup of the index. (Which leads to the question, is there a way using   
   all the new whiter than white x86 instructions to fast match a 4-byte value in   
   an array of up to (potentially) 234 values? I'm now using the "vpcmpistri" (see   
   another tread) to find CR's in the buffer (and don't understand why I require a   
   VPXOR or set them to 0x00), and similar code would probably faster than a   
   discretely coded version of CMPSD...)   
      
   which won't result in pretty code using Virtual Pascal. Need to check what   
   Enterprise PL/I generates, but even here, based on past experience, I'm   
   somewhat   
   pessimistic.   
      
   More later,   
      
   Robert   
      
   --   
   Robert AH Prins   
   robert(a)prino(d)org   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]