From: cr88192@gmail.com   
      
   On 9/6/2025 11:21 AM, MitchAlsup wrote:   
   >   
   > BGB posted:   
   >   
   >> Just randomly thinking again about some things I noticed with audio at   
   >> low sample rates.   
   >>   
   >> For baseline, can note, basic sample rates:   
   >> 44100: Standard, sounds good, but bulky   
   >   
   > No it does not sound "good" on a system that accurately reproduces   
   > 22KHz; like systems with electrostatic speakers covering the high   
   > end of the audio spectrum.   
   >   
   > Might sound "good" to someone who does not know what it is supposed   
   > to actually sound like, though.   
   >   
      
   Dunno. I mostly use headphones.   
      
      
   Seemingly, at least with the headphones I have, I can hear tones up to   
   around 17 kHz, but above this, pretty much nothing.   
      
   I noticed when trying to get new headphones, I got some cheap ones at   
   first that sounded like muffled crap (they were around $10 IIRC). I   
   tried generating tones and with these headphones audio dropped off to   
   nothing after around 11 kHz. Ended up needing to buy some slightly more   
   expensive headphones (around $30 IIRC, from Logitech), which sounded a   
   bit better.   
      
   Ended up giving the cheap ones to my dad, they apparently worked fine   
   for him.   
      
      
      
   Below 1kHz, sine waves rapidly drop off in intensity, whereas square and   
   sawtooth waves retain full loudness.   
      
   on the headphones, I can still hear sine waves (well under 1kHz) if the   
   volume is fairly high.   
      
      
   IRL, I have noted that I am mostly unable to hear tuning forks.   
      
   My mom also recently got a "steel tongue drum" (with an apparent 432Hz   
   tuning), which I had noted I can sorta hear, but the sound is very   
   quiet. I mostly hear the "thwap" sound when she uses the little   
   rubber-tipped mallet on it.   
      
   If I put my hand near it, I can feel vibrations, but I don't really hear   
   anything.   
      
      
      
   Personally, much over a 32 kHz sample rate, any difference rapidly drops   
   off, so 44100 and 48000 seem to sound basically the same.   
      
      
   I was mostly trying to explore the area around 8000 though, where   
   normally I hear crap-all. But, seemingly, with some questionable   
   filtering, intelligible speech can come through, I just don't entirely   
   understand how it works.   
      
   But, as noted, there are several variations of the trick:   
    Feed audio through ADPCM;   
    Works better with either 2-bit/sample IMA,   
    or with encoder tuned to overshoot.   
    Model audio as line-fitting during downsampling.   
    This is likely similar to what ADPCM ends up doing.   
    Model audio as B-spline fitting.   
    Seems to preserve more perceptual quality than the line fitting.   
      
   But, what I am not entirely sure of is why this would make any real   
   difference.   
      
   But, can note that it does differ from the more conventional   
   downsampling strategies of "just average stuff", in that both approaches   
   tend to generate points outside the original curve.   
      
      
      
   >> 32000: Sounds good   
   >> 22050: Moderate   
   >> 16000: OK, Modest size, acceptable quality.   
   >> Seems like best tradeoff if not going for high quality.   
   >> 11025: Poor, muffled.   
   >> 8000: Very poor, speech almost unintelligible (normally).   
   >> But, it is seeming like a "weird hack" may exist here.   
      
      
   Seemingly, there is no general disagreement that 11025 and 8000 sound   
   kinda like crap?...   
      
   I guess 11025 worked OK for Doom and Quake.   
    Quake 2 had used 22050 (but, still 8-bit PCM).   
    Quake 3 had used 22050 (but 16-bit PCM now)   
      
   With Wolfenstein 3D, it wasn't until hearing some slightly better   
   quality versions of the sound effects from the iOS port that I realized   
   the enemies were saying stuff for their sound effects. Like, the   
   low-level enemies apparently saying "Achtung!" rather than "Aaah-Uuuh"   
   (but, with the audio from the DOS version, just sorta heard a whole lot   
   of the latter).   
      
      
   But, as noted, I mostly ended up preferring 16000 A-Law for sound   
   effects and similar as a good tradeoff for space and quality. Also   
   ADPCM, which uses less space.   
      
   Some people seem to try to use MP3 or OGG for sound effects, but:   
    128 kbps: Bulky   
    64 kbps: Poor   
    32 kbps: Can full of broken glass.   
   In addition to both formats being complicated, computationally expensive   
   to decode, and typically needing to use a third party library to decode   
   them.   
      
      
   Also, in this case, 2-bit IMA ADPCM seems to somewhat beat MP3 at the   
   low bitrate game (at least to my hearing).   
      
      
   Not sure of a good way to go lower, best way I have found in past   
   fiddling was, eg:   
    Downsample by 1/16 or so to generate a reference line;   
    Eg, spline-fitting the samples;   
    Also generate the side-intensity   
    Eg, standard deviation from samples and the spline.   
    Store this line in some form, such as via ADPCM;   
    Approximate the intermediate table with patterns from a table.   
    The table of patterns itself derived partly from the frequencies.   
    Stores the relative intensity above/below the spline curve.   
      
   Where, one way of storing the line is, say:   
    4x 3-bit, each control-point sample, as ADPCM   
    3 or 4-bit, side/intensity sample (eg, standard deviation channel).   
   Pattern table might be stored as 4 or 8 bits per block.   
    Pattern is chosen by whichever best fits the intermediate samples.   
      
   but with, say, 8 bits per sample block, but with a 16x internal   
   downsample, could work out to 0.5 bits/sample (or, 16kHz audio in   
   8kbps). With 8-bit patterns, it is 0.75 bits/sample.   
      
      
   Example patterns:   
    0: Flat line, follow spline   
    1: Positive (sin 8*PI)   
    2: Positive Hump (sin PI)   
    3: Negative Hump   
    4: Positive (sin 2*PI)   
    5: Negative (sin 2*PI)   
    6: Positive (sin 3*PI)   
    7: Negative (sin 3*PI)   
    8/9: 4*PI   
    A/B: 5*PI   
    C/D: 6*PI   
    E/F: 7*PI   
    ...   
   If using 6 or 8-bit patterns, it can include a second (or 3rd)   
   sub-frequency.   
    00..0F: Same as above   
    10..1F: Same main pattern as 00..0F   
    Sub-frequency mirrored in frequency and polarity (+8 mod 16).   
    Roughly 5/8 amplitude of main frequency.   
    2x: Same, but lower intensity sub-frequency (3/8).   
    3x: Same, but lower intensity sub-frequency (1/8).   
    4x..7x: Same, but use a different sub-frequency index (+/-5 mod 16).   
    Encodes offset sign and intensity (5/8 or 3/8).   
    8x..Fx: Add a 3rd frequency, lower intensity than the second (1/8).   
    Similar strategy to above.   
    ...   
      
      
   Decoding algorithm would work in blocks, eg:   
    Unpack spline points;   
    Interpolate splines for each sample;   
    Multiply deviation channel with the values from the pattern table.   
    This is then added onto the base spline.   
      
      
   However, this sort of approach is somewhat more complicated than just   
   using a low-bitrate ADPCM (and I haven't used it much).   
      
   Also, quality is inferior to 2-bit ADPCM.   
      
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|