... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,340 of 131,241
BGB to Robert Finch
Re: Tonights Tradeoff (1/5)
22 Nov 25 14:29:23
   From: cr88192@gmail.com   
      
   On 11/22/2025 11:45 AM, Robert Finch wrote:   
   > On 2025-11-22 5:54 a.m., BGB wrote:   
   >> On 11/21/2025 9:09 PM, Robert Finch wrote:   
   >>> On 2025-11-21 2:36 p.m., BGB wrote:   
   >>>> On 11/21/2025 7:31 AM, Michael S wrote:   
   >>>>> On Thu, 13 Nov 2025 19:04:18 GMT   
   >>>>> MitchAlsup  wrote:   
   >>>>>   
   >>>>>> Michael S  posted:   
   >>>>>>   
   >>>>>>> Not really.   
   >>>>>>> That is, conversions are not blazingly fast, but still much better   
   >>>>>>> than any attempt to divide in any form of decimal. And helps to   
   >>>>>>> preserve your sanity.   
   >>>>>>   
   >>>>>> Are you trying to pull our proverbial leg here ?!?   
   >>>>>>   
   >>>>>   
   >>>>> After reading paragraph 5.2 of IEEE-754-2008 Standard I am less   
   >>>>> sure in   
   >>>>> correctness of my above statement.   
   >>>>> For the case of exact division, preservation of mental sanity during   
   >>>>> fulfillment of requirements of this paragraph is far from simple,   
   >>>>> regardless of numeric base used in the process.   
   >>>>>   
   >>>>   
   >>>> One effectively needs to do a special extra-wide divide rather than   
   >>>> just a normal integer divide, etc.   
   >>>>   
   >>>>   
   >>>> But, yeah, fastest I had gotten in my experiments was radix-10e9   
   >>>> long- division, but still not the fastest option.   
   >>>>   
   >>>> So, rough ranking, fast to slow:   
   >>>>    Radix-10e9 Long Divide (fastest)   
   >>>>    Newton-Raphson   
   >>>>    Radix-10 Long Divide   
   >>>>    Integer Shift-Subtract with converters (slowest).   
   >>>>      Fastest converter strategy ATM:   
   >>>>        Radix-10e9 double-dabble (Int->Dec).   
   >>>>        MUL-by-10e9 and ADD (Dec->Int)   
   >>>>          Fastest strategy: Unrolled Shifts and ADDs (*1).   
   >>>>   
   >>>>   
   >>>> *1: While it is possible to perform a 128-bit multiply decomposing   
   >>>> into multiplying 32-bit parts and adding them together; it was   
   >>>> working out slightly faster in this case to do a fixed multiply by   
   >>>> decomposing it into a series of explicit shifts and ADDs.   
   >>>>   
   >>>> Though, in this case, it is faster (and less ugly) to decompose this   
   >>>> into a pattern of iteratively multiplying by smaller amounts. I had   
   >>>> ended up using 4x multiply by 100 followed by multiply by 10, as   
   >>>> while not the fastest strategy, needs less code than 2x multiply by   
   >>>> 10000 + multiply by 10. Most other patterns would need more shifts   
   >>>> and adds.   
   >>>>   
   >>>> In theory, x86-64 could do it better with multiply ops, but getting   
   >>>> something optimal out of the C compilers is a bigger issue here it   
   >>>> seems.   
   >>>>   
   >>>>   
   >>>> Unexplored options:   
   >>>>    Radix 10e2 (byte)   
   >>>>    Radix 10e3 (word)   
   >>>>    Radix 10e4 (word)   
   >>>>   
   >>>> Radix 10e3 could have the closest to direct mapping to DPD.   
   >>>>   
   >>>>   
   >>>> Looking at the decNumber code, it appears also to be Radix-10e9 based.   
   >>>> They also do significant (ab)use of the C preprocessor.   
   >>>>   
   >>>> Apparently, "Why use functions when you can use macros?"...   
   >>>>   
   >>>>   
   >>>> For the Radix-10e9 long-divide, part of the magic was in the   
   >>>> function to scale a value by a radix value and subtract it from   
   >>>> another array.   
   >>>>   
   >>>> Ended up trying a few options, fastest was to temporarily turn the   
   >>>> operation into non-normalized 64-bit pieces and then normalize the   
   >>>> result (borrow propagation, etc) as an output step.   
   >>>>   
   >>>> Initial attempt kept it normalized within the operation, which was   
   >>>> slower.   
   >>>>   
   >>>> It was seemingly compiler-dependent whether it was faster to do a   
   >>>> combined operation, or separate scale and subtract, but the margins   
   >>>> were small. On MSVC the combined operation was slightly faster than   
   >>>> the separate operations.   
   >>>>   
   >>>> ...   
   >>>>   
   >>>>   
   >>>>   
   >>>> Otherwise, after this, just went and fiddled with BGBCC some more,   
   >>>> adding more options for its resource converter.   
   >>>>   
   >>>> Had before (for image formats):   
   >>>>    In: TGA, BMP (various), PNG, QOI, UPIC   
   >>>>    Out: BMP (various), QOI, UPIC   
   >>>>   
   >>>> Added (now):   
   >>>>    In: PPM, JPG, DDS   
   >>>>    Out: PNG, JPG, DDS (DXT1 and DXT5)   
   >>>>   
   >>>> Considered (not added yet):   
   >>>>    PCX   
   >>>> Evaluated PCX, possible but not a clear win.   
   >>>>   
   >>>>   
   >>>> Fiddled with making the PNG encoder less slow, mostly this was   
   >>>> tweaking some parameters for the LZ searches. Initial settings were   
   >>>> using deeper searches over initially smaller sliding windows (at   
   >>>> lower compression levels); better in this case to do a shallower   
   >>>> search over a max-sized sliding window.   
   >>>>   
   >>>> ATM, speed of PNG is now on-par with the JPG encoder (still one of   
   >>>> the slower options).   
   >>>>   
   >>>> For simple use-cases, PNG still loses (in terms of both speed and   
   >>>> compression) to 16-color BMP + LZ compression (LZ4 or RP2).   
   >>>> Theoretically, indexed-color PNG exists, but is less widely supported.   
   >>>>   
   >>>> It is less space-efficient to represent 16-colors as Deflate-   
   >>>> compressed color differences than it is to just represent the 4-bit   
   >>>> RGBI values directly.   
   >>>>   
   >>>> However, can note that the RLE compression scheme (used by PCX) is   
   >>>> clearly inferior to that of any sort of LZ compression.   
   >>>>   
   >>>>   
   >>>> Comparably, PNG is also a more expensive format to decode as well   
   >>>> (even vs JPEG).   
   >>>>   
   >>>>   
   >>>> UPIC can partly address the use-cases of both PNG and JPEG while   
   >>>> being cheaper to decode than either, but more niche as pretty much   
   >>>> nothing supports it. Some of its design and properties being mostly   
   >>>> JPEG-like.   
   >>>>   
   >>>> QOI is interesting, but suffers some similar limitations to PCX (its   
   >>>> design is mostly about more compactly encoding color-differences in   
   >>>> true-color images and otherwise only offers RLE compression).   
   >>>>   
   >>>> QOI is not particularly effective against images with little variety   
   >>>> in color variation but lots of repeating patterns (I have a modified   
   >>>> QOI that does a little better here, still not particularly effective   
   >>>> with 16-color graphics though).   
   >>>>   
   >>>>   
   >>>> Otherwise, also added up adding a small text format for image   
   >>>> drawing commands.   
   >>>>   
   >>>> As a simplistic line oriented format containing various commands to   
   >>>> perform drawing operations or composite images.   
   >>>>    creating a "canvas"   
   >>>>    setting the working color   
   >>>>    drawing lines   
   >>>>    bucket fill   
   >>>>    drawing text strings   
   >>>>    overlaying other images   
   >>>>    ...   
   >>>>   
   >>>>   
   >>>> This is maybe (debatable) outside the scope of a C compiler, but   
   >>>> could have use-cases for preparing resource data (nevermind if scope   
   >>>> creep is partly also turning it into an asset-packer tool; where it   
   >>>> is useful to make graphics/sounds/etc in one set of formats and then   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]