... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,045 of 131,241
BGB to Terje Mathisen
Re: Crisis? What Crisis? (was Re: On Cra
20 Oct 25 14:21:14
   From: cr88192@gmail.com   
      
   On 10/20/2025 4:06 AM, Terje Mathisen wrote:   
   > David Brown wrote:   
   >> On 19/10/2025 03:17, Lawrence Dâ€™Oliveiro wrote:   
   >>> On Sat, 18 Oct 2025 10:21:32 +0200, Terje Mathisen wrote:   
   >>>   
   >>>> MitchAlsup wrote:   
   >>>>>   
   >>>>> On Fri, 17 Oct 2025 22:20:49 -0000 (UTC), Lawrence Dâ€™Oliveiro   
   wrote:   
   >>>>>>   
   >>>>>> Short-vector SIMD was introduced along an entirely separate   
   >>>>>> evolutionary path, namely that of bringing DSP-style operations   
   >>>>>> into general-purpose CPUs.   
   >>>>>   
   >>>>> MMX was designed to kill off the plug in Modems.   
   >>>>   
   >>>> MMX was quite obviously (also) intended for short vectors of   
   >>>> typically 8 and 16-bit elements, it was the enabler for sw DVD   
   >>>> decoding. ZoranDVD was the first to properly handle 30 frames/second   
   >>>> with zero skips, it needed a PentiumMMX-200 to do so.   
   >>>   
   >>> I think the initial â€œkiller appâ€ for short-vector SIMD was   
   very much   
   >>> video encoding/decoding, not audio encoding/decoding. Audio was   
   >>> already easy enough to manage with general-purpose CPUs of the 1990s.   
   >>   
   >> Agreed.  But having SIMD made audio processing more efficient, which   
   >> was a nice bonus - especially if you wanted more than CD quality audio.   
   >   
   > Having SIMD available was a key part of making the open source Ogg   
   > Vorbis decoder 3x faster.   
   >   
   > It worked on MMX/SSE/SSE2/Altivec.   
   >   
      
   Yeah. Audio is fun...   
      
      
   But MP3 and Vorbis have the odd property of either sounding really good   
   (at high bitrates) or terrible (at lower bitrates, particularly if used   
   for something with variable playback speed).   
      
   Seems to be a general issue with audio codecs built from a similar sort   
   of block-transform approach (such as MDCT or WHT).   
      
      
   In some of my own experiments in a similar area, I had used WHT, but   
   didn't get quite so good of results. One problem seems to be that there   
   is a sort of big issue with frequencies near the block-size, which   
   result in nasty artifacts. The overlapping blocks and windowing of MDCT   
   reduce this issue, but as noted, MDCT has a high computational cost (vs   
   Haar or WHT).   
      
   have yet to come up with something in this category that gives   
   satisfactory results (cheap, simple, effective, and passable quality).   
      
      
   Can also note: ADPCM works OK.   
      
   Can get better results IMO at bitrates lower than where MP3 or Vorbis   
   are effective.   
      
   Near the lower end:   
      16kHz 2-bit ADPCM: OK, 32kbps   
      11kHz 2-bit ADPCM: meh, 22kbps   
      8kHz 4-bit ADPCM: Weak, 32kbps   
      8kHz 2-bit ADPCM: poor, 16kbps   
      
      
   Getting OK results at 2-bits/sample requires a different approach from   
   what works well at 4 bits, namely rather than encoding one sample at a   
   time, it is usually needed to encode a block of samples at a time and   
   then search the entire possibility space. Trying to encode samples one   
   at a time gives poor results. This makes 2-bit encoding slower and more   
   complicated than 4-bit encoding (but decoder can still be fast).   
      
   As noted, ADPCM proper does not work below 2 bits/sample.   
      
   The added accuracy of 4-bit samples is not an advantage in this case   
   since the reduction in sample rate has a more obvious negative impact here.   
      
      
   After trying a few experiments, the current front-runner for going lower is:   
   Encode a group of 8 or 16 samples as an 8-bit index into a table of   
   patterns (such as groups of 2-bit ADPCM samples);   
   This can achieve 1.0 or 0.5 bits/sample.   
      
   Have yet to get anything with particularly acceptable audio quality though.   
      
   Did end up resorting to using genetic algorithms for building the   
   pattern tables for these experiments. I did previously experiment with   
   an interpolation pattern table, but this gave worse results.   
      
      
   One other line of experimentation was trying to fudge the ADPCM encoding   
   algorithm to preferentially try to generate repeating patterns over   
   novel ones with the aim of making it more compressible with LZ77.   
      
   However, it was difficult to significantly improve LZ compressibility   
   while still maintaining some semblance of audio quality. Neither   
   byte-oriented LZ (eg, LZ4) not Deflate, was particularly effective.   
      
      
   Did note however that both LZMA and an LZMA style bitwise range encoder   
   were much more effective (particularly with 12 or 16 bits of context).   
      
   However, a range encoder is near the upper end of computational   
   feasibility (and using a range encoder to squeeze bits out of ADPCM   
   seems kinda absurd).   
      
      
   One intermediate option seems to be a permutation transform. This can   
   make the data more amendable to STF+AdRice or Huffman.   
      
   Say, a 2-bit permutation is transform possible (though, in this case one   
   can represent every permutation as a 5-bit finite state machine, stored   
   as bytes in RAM for convenience). This does have the nice property that   
   one can use an 8 bit table lookup for each context which then produces 2   
   bits of output at a time.   
      
   Say:   
      hist: 8 bits of history   
      ival: input, 4x 2-bits   
      oval: output, 4x 2-bits, permuted   
      
      px1=permstate[hist];   
      ix=((ival>>0)&0x03);   
      px2=permupdtab[(px1&0xFC)|ix];   
      permstate[hist]=px2;   
      hist=(hist<<2)|ix;   
      oval=px2&3;   
      
      px1=permstate[hist];   
      ix=((ival>>2)&0x03);   
      px2=permupdtab[(px1&0xFC)|ix];   
      permstate[hist]=px2;   
      hist=(hist<<2)|ix;   
      oval=oval|((px2&3)<<2);   
      ...   
      
   Decoding process is similar   
      
   One downside of this is that they are still about as slow as using the   
   bitwise range-coder would have been.   
      
      
   Also, still doesn't really allow breaking into sub 10 kbps territory   
   without a loss of quality. The use of pattern tables allows breaking   
   into this territory with a similar loss of quality, and at a lower   
   computational cost.   
      
   Though, it seems possible that the permutation transform could be   
   directly integrated with the ADPCM decoder (in effect turning it into   
   more of a predictive transform); still wouldn't do much for speed, but   
   alas. Would also still need an entropy coder to make use of this.   
      
      
      
   One other route seems to be sinewave synthesis, say:   
      Pick the top 4 sine waves via some strategy;   
      Encode the frequency and amplitude (needs ~ 16 bits IME);   
      Do this ~ 100-128 times per second.   
        100Hz seems to be a lower limit for intelligibility.   
      
   This needs ~ 6.4 to 8.2 kbps, or 7.2 to 9.2 kbps if one also includes a   
   byte to encode a white noise intensity.   
      
   I had best results by taking the space from 2 to 8 kHz, dividing them   
   into ~ 1/3 octaves, picking the strongest wave from each group, and then   
   picking the top 4 strongest waves. Worked better for me to ignore lower   
   frequencies (low frequencies seem to contain a lot of louder wave-forms,   
   but which contribute little to intelligibility). In this case, waves   
   between 2 and 4 kHz tend to dominate.   
      
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]