... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,333 of 131,241
BGB to Robert Finch
Re: Tonights Tradeoff (1/3)
22 Nov 25 04:54:00
   From: cr88192@gmail.com   
      
   On 11/21/2025 9:09 PM, Robert Finch wrote:   
   > On 2025-11-21 2:36 p.m., BGB wrote:   
   >> On 11/21/2025 7:31 AM, Michael S wrote:   
   >>> On Thu, 13 Nov 2025 19:04:18 GMT   
   >>> MitchAlsup  wrote:   
   >>>   
   >>>> Michael S  posted:   
   >>>>   
   >>>>> Not really.   
   >>>>> That is, conversions are not blazingly fast, but still much better   
   >>>>> than any attempt to divide in any form of decimal. And helps to   
   >>>>> preserve your sanity.   
   >>>>   
   >>>> Are you trying to pull our proverbial leg here ?!?   
   >>>>   
   >>>   
   >>> After reading paragraph 5.2 of IEEE-754-2008 Standard I am less sure in   
   >>> correctness of my above statement.   
   >>> For the case of exact division, preservation of mental sanity during   
   >>> fulfillment of requirements of this paragraph is far from simple,   
   >>> regardless of numeric base used in the process.   
   >>>   
   >>   
   >> One effectively needs to do a special extra-wide divide rather than   
   >> just a normal integer divide, etc.   
   >>   
   >>   
   >> But, yeah, fastest I had gotten in my experiments was radix-10e9 long-   
   >> division, but still not the fastest option.   
   >>   
   >> So, rough ranking, fast to slow:   
   >>    Radix-10e9 Long Divide (fastest)   
   >>    Newton-Raphson   
   >>    Radix-10 Long Divide   
   >>    Integer Shift-Subtract with converters (slowest).   
   >>      Fastest converter strategy ATM:   
   >>        Radix-10e9 double-dabble (Int->Dec).   
   >>        MUL-by-10e9 and ADD (Dec->Int)   
   >>          Fastest strategy: Unrolled Shifts and ADDs (*1).   
   >>   
   >>   
   >> *1: While it is possible to perform a 128-bit multiply decomposing   
   >> into multiplying 32-bit parts and adding them together; it was working   
   >> out slightly faster in this case to do a fixed multiply by decomposing   
   >> it into a series of explicit shifts and ADDs.   
   >>   
   >> Though, in this case, it is faster (and less ugly) to decompose this   
   >> into a pattern of iteratively multiplying by smaller amounts. I had   
   >> ended up using 4x multiply by 100 followed by multiply by 10, as while   
   >> not the fastest strategy, needs less code than 2x multiply by 10000 +   
   >> multiply by 10. Most other patterns would need more shifts and adds.   
   >>   
   >> In theory, x86-64 could do it better with multiply ops, but getting   
   >> something optimal out of the C compilers is a bigger issue here it seems.   
   >>   
   >>   
   >> Unexplored options:   
   >>    Radix 10e2 (byte)   
   >>    Radix 10e3 (word)   
   >>    Radix 10e4 (word)   
   >>   
   >> Radix 10e3 could have the closest to direct mapping to DPD.   
   >>   
   >>   
   >> Looking at the decNumber code, it appears also to be Radix-10e9 based.   
   >> They also do significant (ab)use of the C preprocessor.   
   >>   
   >> Apparently, "Why use functions when you can use macros?"...   
   >>   
   >>   
   >> For the Radix-10e9 long-divide, part of the magic was in the function   
   >> to scale a value by a radix value and subtract it from another array.   
   >>   
   >> Ended up trying a few options, fastest was to temporarily turn the   
   >> operation into non-normalized 64-bit pieces and then normalize the   
   >> result (borrow propagation, etc) as an output step.   
   >>   
   >> Initial attempt kept it normalized within the operation, which was   
   >> slower.   
   >>   
   >> It was seemingly compiler-dependent whether it was faster to do a   
   >> combined operation, or separate scale and subtract, but the margins   
   >> were small. On MSVC the combined operation was slightly faster than   
   >> the separate operations.   
   >>   
   >> ...   
   >>   
   >>   
   >>   
   >> Otherwise, after this, just went and fiddled with BGBCC some more,   
   >> adding more options for its resource converter.   
   >>   
   >> Had before (for image formats):   
   >>    In: TGA, BMP (various), PNG, QOI, UPIC   
   >>    Out: BMP (various), QOI, UPIC   
   >>   
   >> Added (now):   
   >>    In: PPM, JPG, DDS   
   >>    Out: PNG, JPG, DDS (DXT1 and DXT5)   
   >>   
   >> Considered (not added yet):   
   >>    PCX   
   >> Evaluated PCX, possible but not a clear win.   
   >>   
   >>   
   >> Fiddled with making the PNG encoder less slow, mostly this was   
   >> tweaking some parameters for the LZ searches. Initial settings were   
   >> using deeper searches over initially smaller sliding windows (at lower   
   >> compression levels); better in this case to do a shallower search over   
   >> a max-sized sliding window.   
   >>   
   >> ATM, speed of PNG is now on-par with the JPG encoder (still one of the   
   >> slower options).   
   >>   
   >> For simple use-cases, PNG still loses (in terms of both speed and   
   >> compression) to 16-color BMP + LZ compression (LZ4 or RP2).   
   >> Theoretically, indexed-color PNG exists, but is less widely supported.   
   >>   
   >> It is less space-efficient to represent 16-colors as Deflate-   
   >> compressed color differences than it is to just represent the 4-bit   
   >> RGBI values directly.   
   >>   
   >> However, can note that the RLE compression scheme (used by PCX) is   
   >> clearly inferior to that of any sort of LZ compression.   
   >>   
   >>   
   >> Comparably, PNG is also a more expensive format to decode as well   
   >> (even vs JPEG).   
   >>   
   >>   
   >> UPIC can partly address the use-cases of both PNG and JPEG while being   
   >> cheaper to decode than either, but more niche as pretty much nothing   
   >> supports it. Some of its design and properties being mostly JPEG-like.   
   >>   
   >> QOI is interesting, but suffers some similar limitations to PCX (its   
   >> design is mostly about more compactly encoding color-differences in   
   >> true-color images and otherwise only offers RLE compression).   
   >>   
   >> QOI is not particularly effective against images with little variety   
   >> in color variation but lots of repeating patterns (I have a modified   
   >> QOI that does a little better here, still not particularly effective   
   >> with 16-color graphics though).   
   >>   
   >>   
   >> Otherwise, also added up adding a small text format for image drawing   
   >> commands.   
   >>   
   >> As a simplistic line oriented format containing various commands to   
   >> perform drawing operations or composite images.   
   >>    creating a "canvas"   
   >>    setting the working color   
   >>    drawing lines   
   >>    bucket fill   
   >>    drawing text strings   
   >>    overlaying other images   
   >>    ...   
   >>   
   >>   
   >> This is maybe (debatable) outside the scope of a C compiler, but could   
   >> have use-cases for preparing resource data (nevermind if scope creep   
   >> is partly also turning it into an asset-packer tool; where it is   
   >> useful to make graphics/sounds/etc in one set of formats and then   
   >> process and convert them into another set of files, usually inside of   
   >> some sort of VFS image or similar).   
   >>   
   >> Design is much more simplistic than something like SVG and I am   
   >> currently assuming its use for mostly hand-edited files. Unlike SVG,   
   >> it also assumes drawing to a pixel grid rather than some more abstract   
   >> coordinate space (so, its abstract model is more like "MS Paint" or   
   >> similar); also SVG would suck as a human-edited format.   
   >>   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]