home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 4,189 of 4,675   
   Terje Mathisen to Rick C. Hodgin   
   Re: Assembly contest idea   
   15 Oct 20 08:40:41   
   
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Rick C. Hodgin wrote:   
   > On 10/14/20 10:00 PM, Melzzzzz wrote:   
   >> On 2020-10-13, Rick C. Hodgin    
   >> wrote:   
   >>> I have two rectangles r1 and r2.  I need to determine if r1 is 1) fully   
   >>> inside, 2) partially inside, or 3) not inside r2.   
   >>> The coordinates are all signed integers ranging from -32768 to +32767.   
   >>   
   >> Too old for this ;)   
   >>   
   >> I think that I solved this problem some 30 years ago :P   
   >   
   > I spent some time searching online for branchless compares to see if I   
   > could find some optimizations that would be useful.   
   >   
   > I remember coming across a page that showed all kinds of optimizations   
   > for adding certain values, subtracting certain values, etc.   
   >   
   > I'm thinking there is a way to take the values and perform only ADD,   
   > SUB, INC, DEC, OR, and XOR to get a result.  If it could be done for a   
   > type of between() function, the rest becomes academic.   
   >   
   Did not finish my previous post:   
      
   The idea is that we need to compare the left edge of rectangle one   
   against both the left and right edge of rectangle2, and the same for the   
   right edge, so this is 8 compares for the vertical lines. Add the same   
   for the horizontal and you need 16 compares which would fit in a much   
   more common AVX register.   
      
   R2 totally inside R1 becomes   
      
      (R2.l >= R1.l) AND (R2.r <= R1.r) AND   
      (R2.b >= R1.b) AND (R2.a <= R1.a)   
      
   i.e. relatively simple, while zero overlap is much harder, and related   
   to the clipping question from last week: It requires all of R2 to be   
   either to the left, right, above or below, so there are 4 different   
   masks that give the same result and you have to OR those together:   
      
     (R2.r < R1.l) OR (R2.a < R1.b) OR (R2.l >= R1.r) OR (R2.b >= R1.a)   
      
   All that remains is to setup all those values in the correct spot so   
   that these 8 compares can be done (some of them needs to be inverted   
   after the fact obviously!) with a single compare_greater_or_equal()   
   instruction. Seems doable by duplicating the R1 values so that each   
   occur twice, then use a permute on the R2 values to put them in the   
   right spots, do the compare, flip half the results and then the slowest   
   part: Horizontally combine them with and AND for the inside result, with   
   OR for outside, and then all partial overlaps fall out by being neither   
   of these.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca