... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,330 of 131,241
Robert Finch to BGB
Re: Tonights Tradeoff (1/2)
21 Nov 25 22:09:00
   From: robfi680@gmail.com   
      
   On 2025-11-21 2:36 p.m., BGB wrote:   
   > On 11/21/2025 7:31 AM, Michael S wrote:   
   >> On Thu, 13 Nov 2025 19:04:18 GMT   
   >> MitchAlsup  wrote:   
   >>   
   >>> Michael S  posted:   
   >>>   
   >>>> Not really.   
   >>>> That is, conversions are not blazingly fast, but still much better   
   >>>> than any attempt to divide in any form of decimal. And helps to   
   >>>> preserve your sanity.   
   >>>   
   >>> Are you trying to pull our proverbial leg here ?!?   
   >>>   
   >>   
   >> After reading paragraph 5.2 of IEEE-754-2008 Standard I am less sure in   
   >> correctness of my above statement.   
   >> For the case of exact division, preservation of mental sanity during   
   >> fulfillment of requirements of this paragraph is far from simple,   
   >> regardless of numeric base used in the process.   
   >>   
   >   
   > One effectively needs to do a special extra-wide divide rather than just   
   > a normal integer divide, etc.   
   >   
   >   
   > But, yeah, fastest I had gotten in my experiments was radix-10e9 long-   
   > division, but still not the fastest option.   
   >   
   > So, rough ranking, fast to slow:   
   >    Radix-10e9 Long Divide (fastest)   
   >    Newton-Raphson   
   >    Radix-10 Long Divide   
   >    Integer Shift-Subtract with converters (slowest).   
   >      Fastest converter strategy ATM:   
   >        Radix-10e9 double-dabble (Int->Dec).   
   >        MUL-by-10e9 and ADD (Dec->Int)   
   >          Fastest strategy: Unrolled Shifts and ADDs (*1).   
   >   
   >   
   > *1: While it is possible to perform a 128-bit multiply decomposing into   
   > multiplying 32-bit parts and adding them together; it was working out   
   > slightly faster in this case to do a fixed multiply by decomposing it   
   > into a series of explicit shifts and ADDs.   
   >   
   > Though, in this case, it is faster (and less ugly) to decompose this   
   > into a pattern of iteratively multiplying by smaller amounts. I had   
   > ended up using 4x multiply by 100 followed by multiply by 10, as while   
   > not the fastest strategy, needs less code than 2x multiply by 10000 +   
   > multiply by 10. Most other patterns would need more shifts and adds.   
   >   
   > In theory, x86-64 could do it better with multiply ops, but getting   
   > something optimal out of the C compilers is a bigger issue here it seems.   
   >   
   >   
   > Unexplored options:   
   >    Radix 10e2 (byte)   
   >    Radix 10e3 (word)   
   >    Radix 10e4 (word)   
   >   
   > Radix 10e3 could have the closest to direct mapping to DPD.   
   >   
   >   
   > Looking at the decNumber code, it appears also to be Radix-10e9 based.   
   > They also do significant (ab)use of the C preprocessor.   
   >   
   > Apparently, "Why use functions when you can use macros?"...   
   >   
   >   
   > For the Radix-10e9 long-divide, part of the magic was in the function to   
   > scale a value by a radix value and subtract it from another array.   
   >   
   > Ended up trying a few options, fastest was to temporarily turn the   
   > operation into non-normalized 64-bit pieces and then normalize the   
   > result (borrow propagation, etc) as an output step.   
   >   
   > Initial attempt kept it normalized within the operation, which was slower.   
   >   
   > It was seemingly compiler-dependent whether it was faster to do a   
   > combined operation, or separate scale and subtract, but the margins were   
   > small. On MSVC the combined operation was slightly faster than the   
   > separate operations.   
   >   
   > ...   
   >   
   >   
   >   
   > Otherwise, after this, just went and fiddled with BGBCC some more,   
   > adding more options for its resource converter.   
   >   
   > Had before (for image formats):   
   >    In: TGA, BMP (various), PNG, QOI, UPIC   
   >    Out: BMP (various), QOI, UPIC   
   >   
   > Added (now):   
   >    In: PPM, JPG, DDS   
   >    Out: PNG, JPG, DDS (DXT1 and DXT5)   
   >   
   > Considered (not added yet):   
   >    PCX   
   > Evaluated PCX, possible but not a clear win.   
   >   
   >   
   > Fiddled with making the PNG encoder less slow, mostly this was tweaking   
   > some parameters for the LZ searches. Initial settings were using deeper   
   > searches over initially smaller sliding windows (at lower compression   
   > levels); better in this case to do a shallower search over a max-sized   
   > sliding window.   
   >   
   > ATM, speed of PNG is now on-par with the JPG encoder (still one of the   
   > slower options).   
   >   
   > For simple use-cases, PNG still loses (in terms of both speed and   
   > compression) to 16-color BMP + LZ compression (LZ4 or RP2).   
   > Theoretically, indexed-color PNG exists, but is less widely supported.   
   >   
   > It is less space-efficient to represent 16-colors as Deflate-compressed   
   > color differences than it is to just represent the 4-bit RGBI values   
   > directly.   
   >   
   > However, can note that the RLE compression scheme (used by PCX) is   
   > clearly inferior to that of any sort of LZ compression.   
   >   
   >   
   > Comparably, PNG is also a more expensive format to decode as well (even   
   > vs JPEG).   
   >   
   >   
   > UPIC can partly address the use-cases of both PNG and JPEG while being   
   > cheaper to decode than either, but more niche as pretty much nothing   
   > supports it. Some of its design and properties being mostly JPEG-like.   
   >   
   > QOI is interesting, but suffers some similar limitations to PCX (its   
   > design is mostly about more compactly encoding color-differences in   
   > true-color images and otherwise only offers RLE compression).   
   >   
   > QOI is not particularly effective against images with little variety in   
   > color variation but lots of repeating patterns (I have a modified QOI   
   > that does a little better here, still not particularly effective with   
   > 16-color graphics though).   
   >   
   >   
   > Otherwise, also added up adding a small text format for image drawing   
   > commands.   
   >   
   > As a simplistic line oriented format containing various commands to   
   > perform drawing operations or composite images.   
   >    creating a "canvas"   
   >    setting the working color   
   >    drawing lines   
   >    bucket fill   
   >    drawing text strings   
   >    overlaying other images   
   >    ...   
   >   
   >   
   > This is maybe (debatable) outside the scope of a C compiler, but could   
   > have use-cases for preparing resource data (nevermind if scope creep is   
   > partly also turning it into an asset-packer tool; where it is useful to   
   > make graphics/sounds/etc in one set of formats and then process and   
   > convert them into another set of files, usually inside of some sort of   
   > VFS image or similar).   
   >   
   > Design is much more simplistic than something like SVG and I am   
   > currently assuming its use for mostly hand-edited files. Unlike SVG, it   
   > also assumes drawing to a pixel grid rather than some more abstract   
   > coordinate space (so, its abstract model is more like "MS Paint" or   
   > similar); also SVG would suck as a human-edited format.   
   >   
   > Granted, one could argue maybe it could make scope that asset-processing   
   > is its own tool, then one converts it to a format that the compiler   
   > accepts (WAD2 or WAD4 in this case) prior to compiling the main binary   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]