... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 2,906 of 4,675
Terje Mathisen to Robert Prins
Re: Converting some way to clever PL/I c
04 Aug 17 12:37:11
   From: terje.mathisen@nospicedham.tmsw.no   
      
   The problem/idea here is that you initialize the BCD int to an illegal   
   value which is effectively zero, but which will maintain that flag info   
   until the first real operation on it, right?   
      
   For pure binary integer code there is of course no reserved value   
   (should probably have been MININT, i.e. 0x80000/-32768 for a 16-bit   
   int), so you cannot add any extra info here.   
      
   As soon as you reserve a single value as the starting point for your   
   sum, then you cannot handle arbitrary inputs, and since an input of zero   
   is legal and should be added, you must use a separate flag:   
      
      
      sum = 0;   
      added_values = 0;   
      foreach (a in arr[]) {   
        if (a >= 0) {   
          sum += a;   
          added_values++;   
        }   
      }   
      
      ;; ESI->array, ECX has count   
      xor edx,edx   
      xor ebx,ebx   
   next:   
      lodsd   
      test eax,eax   
       jl skip   
      add edx,eax   
      inc ebx   
   skip:   
      loop next   
      
   What's expensive here is the test for >= 0 for each element, not the   
   separate flag value (in EBX): Updating this is totally free.   
      
   If the pattern of valid/invalid values in the input array is   
   unpredictable, then you could consider CMOV operations:   
      
   next:   
      lodsd   
      xor edi,edi   
      test eax,eax   
      
      setge bl   
      cmovge edi,eax   
      
      add edx,eax   
      or bh,bl   
      loop next   
      
   The snippet above will take ~5 cycles/iteration while the branchy   
   version is at least one cycle faster when correctly predicted.   
      
   If only positive array elements were OK, then you could initialize the   
   sum to -1, and at the end check it:   
      
   If still -1 then no legal values were found, otherwise increment the sum   
   and print it.   
      
   Terje   
      
   Robert Prins wrote:   
   > I've recently come across some really clever/very nasty PL/I code,   
   > that would, theoretically, save CPU by eliminating a conditional   
   > jump. It relies on initializing a BCD-encoded ***integer*** variable   
   > ("sum") with -0.1, which results, on IBM mainframes, the last nibble   
   > of the BCD encoded value to contain 0xD (rather than the normal 0xC).   
   > The author uses this to avoid a costly (Phuleeze, pass me a bucket!)   
   > test, so rather than coding:   
   >   
   > sum = -1;   
   >   
   > do i = 1 to whatever; if a(i) >= 0 then if sum <> -1 then sum = sum +   
   > a(i); else sum = a(i); end;   
   >   
   > if sum <> -1 then "print sum";   
   >   
   > the code can be simplified to   
   >   
   > sum = -0.1; /* fraction is discarded, but -sign (0xD) is kept! */   
   >   
   > do i = 1 to whatever; if a(i) >= 0 then sum = sum + a(i); end;   
   >   
   > if last_nibble(sum) <> 0xD then "print sum";   
   >   
   > where "last_nibble" is a simplification of using two actual PL/I   
   > builtin functions that actually allow access to the last nibble of a   
   > BCD encoded value, and the addition of any a(i) to "sum", even an   
   > a(i) = 0 will cause the CPU to normalize the last nibble of "sum" to   
   > 0xC.   
   >   
   > Testing this for big "whatever" (in an outer loop, and using a small   
   > array "a" in the inner loop) makes no flipping difference (on a   
   > Hercules emulated) z/OS system, which doesn't surprise yours truly   
   > one Iota. :)   
   >   
   > However, I would be curious if there is a way to code something   
   > similar in x86 assembler, when using strictly integer values, which   
   > implies that sum/eax must be initialized to -1(?), and the addition   
   > is preceded by a "cmp eax, -1(?)" to set up a carry, but that doesn't   
   > seem to work.   
   >   
   > Or am I just on a wild goose chase?   
   >   
   > Obviously using a "cmp eax, -1" followed by a "sete dl / movzx edx,dl   
   > / add eax,edx / add eax, 'a(i)'" works, but might take a few   
   > nano-seconds more than a pretty much very well predicted conditional   
   > jump...   
   >   
   > Any thoughts?   
   >   
   > Robert   
      
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]