Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 129,505 of 131,241    |
|    BGB to All    |
|    Random: Very Low Precision FP    |
|    26 Aug 25 13:08:29    |
      From: cr88192@gmail.com              Well, idea here is that sometimes one wants to be able to do       floating-point math where accuracy is a very low priority.              Say, the sort of stuff people might use FP8 or BF16 or maybe Binary16       for (though, what I am thinking of here is low-precision even by       Binary16 standards).              But, will use Binary16 and BF16 as the example formats.              So, can note that one can approximate some ops with modified integer       ADD/SUB (excluding sign-bit handling):        a*b : A+B-0x3C00 (0x3F80 for BF16)        a/b : A-B+0x3C00        sqrt(a): (A>>1)+0x1E00              The harder ones though, are ADD/SUB.              A partial ADD seems to be:        a+b: A+((B-A)>>1)+0x0400              But, this simple case seems not to hold up when either doing subtract,       or when A and B are far apart.              So, it would appear either that there is a 4th term or the bias is       variable (depending on the B-A term; and for ADD/SUB).              Seems like the high bits (exponent and operator) could be used to drive       a lookup table, but this is lame, The magic bias appears to have       non-linear properties so isn't as easily represented with basic integer       operations.              Then again, probably other people know about all of this and might know       what I am missing.              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca