... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.arch

Apparently more than just beeps & boops

131,241 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 129,505 of 131,241

BGB to All

Random: Very Low Precision FP

26 Aug 25 13:08:29

   From: cr88192@gmail.com   
      
   Well, idea here is that sometimes one wants to be able to do   
   floating-point math where accuracy is a very low priority.   
      
   Say, the sort of stuff people might use FP8 or BF16 or maybe Binary16   
   for (though, what I am thinking of here is low-precision even by   
   Binary16 standards).   
      
   But, will use Binary16 and BF16 as the example formats.   
      
   So, can note that one can approximate some ops with modified integer   
   ADD/SUB (excluding sign-bit handling):   
     a*b    : A+B-0x3C00  (0x3F80 for BF16)   
     a/b    : A-B+0x3C00   
     sqrt(a): (A>>1)+0x1E00   
      
   The harder ones though, are ADD/SUB.   
      
   A partial ADD seems to be:   
      a+b: A+((B-A)>>1)+0x0400   
      
   But, this simple case seems not to hold up when either doing subtract,   
   or when A and B are far apart.   
      
   So, it would appear either that there is a 4th term or the bias is   
   variable (depending on the B-A term; and for ADD/SUB).   
      
   Seems like the high bits (exponent and operator) could be used to drive   
   a lookup table, but this is lame, The magic bias appears to have   
   non-linear properties so isn't as easily represented with basic integer   
   operations.   
      
   Then again, probably other people know about all of this and might know   
   what I am missing.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]