Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,525 of 4,675    |
|    Robert Prins to Terje Mathisen    |
|    Re: Speeding up code - am I missing some    |
|    17 Aug 18 19:20:54    |
      XPost: comp.lang.pl1, comp.lang.pascal.borland, comp.lang.pascal.misc       From: robert@nospicedham.prino.org              On 2018-08-17 06:49, Terje Mathisen wrote:       > Robert Prins wrote:       >> On 2018-08-16 16:02, Terje Mathisen wrote:       >>> (newsgroups limited to just clax86, my server does not allow       >>> indiscriminate cross-posting.)       >>>       >>> Robert, you do realize that anything you are going to print will take       >>> billions of cycles for every second the printer needs to actually print       >>> the pages?       >>       >> Really? ROFL...       >>       >> Of course I do realise that once I/O is added, all bets are off. But as       >> I've mentioned in another reply you may already have seen, sometimes       >> others approach a problem from a completely different angle, with stunning       >> results.       >       > OK, here's the most obvious (?) one:       >       > Create a function for each of these lists, put them all on an array and call       > them in order with the current item:       >       > Each time a list has been completed, the corresponding function pointer       > removes itself from the array of functions to be called.              Great minds (OK, one very great mind and my rather smaller one) seem to think       the same, I'm actually using a similar approach in the follow-up program that       converts the output of "LIFT" to RTF format:              @02:        mov eax, _s        lea eax, [eax * 4 + offset @03]        jmp dword ptr [eax]               align 4              @03:        dd offset @04 // 0 - Trip -> 0 s -> 1        dd offset @05 // 1 - Trip fall through -> 2        dd offset @06 // 2 - Type -> 2 s -> 3        dd offset @07 // 3 - Type fall through -> 4        dd offset @08 // 4 - Cnty -> 4 s -> 5        dd offset @09 // 5 - Cnty fall through -> 6        dd offset @10 // 6 - Nat -> 6 s -> 7        dd offset @11 // 7 - Nat fall through -> 8        dd offset @14 // 8 - Naco / Cnty #I/#E -> 8 s -> 9        dd offset @18 // 9 - Cnty #I/#E fall through -> 10        dd offset @20 // 10 - Year -> 10 s -> 11        dd offset @21 // 11 - Year fall through -> 12        dd offset @22 // 12 - V! tables -> 12 s -> 13        nop        nop        nop        nop              @04:              Sadly, this approach is not (entirely) possible here, because the Country       top-10       tables are "incomplete", I've hitched through Bulgaria, Slovakia, and Andorra,       but I've never hitched in those countries.              Obviously I could cater for this, but while we've got this discussion going,       I've decided to try filling the entire top-N in the "caching" code, where I end       up going to nearly the bottom, entry 4,300 of (currently) 4,310 rides, anyway.       A       quick test using the old PL/I version, where I can use a compiler option that       tells me how many times each statement is executed, tells me that the executed       statement count is reduced by more than 90% when I use this approach for the       Top-10s of trips. ;)              Only unpleasant side issue is the fact that I'm using 3-D arrays,              dcl 1 t10_trip(trip_total.trip) ctl,        2 tr_ix(3) fixed bin (31) init ((3)0),        2 tr_ktv(3, 10) ptr;              with              #j = t10_trip(lift_list.trip).tr_ix(1) + 1;       if #j <= hbound(tr_ktv, 3) then        t10_trip(lift_list.trip).tr_ktv(1, #j) = lift_ptr;              t10_trip(lift_list.trip).tr_ix(1) = #j;              to fill the top-10 slots, and worse for the Type/Country/Nat tables where I       have       to do a lookup of the index. (Which leads to the question, is there a way using       all the new whiter than white x86 instructions to fast match a 4-byte value in       an array of up to (potentially) 234 values? I'm now using the "vpcmpistri" (see       another tread) to find CR's in the buffer (and don't understand why I require a       VPXOR or set them to 0x00), and similar code would probably faster than a       discretely coded version of CMPSD...)              which won't result in pretty code using Virtual Pascal. Need to check what       Enterprise PL/I generates, but even here, based on past experience, I'm       somewhat       pessimistic.              More later,              Robert              --       Robert AH Prins       robert(a)prino(d)org              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca