Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 130,535 of 131,241    |
|    BGB to All    |
|    Re: trap and emulate, Lessons from the A    |
|    17 Dec 25 02:27:33    |
   
   From: cr88192@gmail.com   
      
   On 12/17/2025 1:11 AM, Lawrence D’Oliveiro wrote:   
   > On Wed, 17 Dec 2025 00:51:17 -0600, BGB wrote:   
   >   
   >> Misaligned access is common enough here that, if it were not supported   
   >> natively, this would likely tank performance...   
   >   
   > Still there are/were some architectures that refused to support it.   
      
   Yes.   
      
   Or, like the "SiFive U74" and similar, where the funny thing of the   
   RISC-V ISA using unscaled displacements but then having a CPU that uses   
   internal traps (and is horribly slow) in the case of misaligned access...   
      
   Meanwhile, I prefer to have memcpy and LZ decompression where   
   "performance doesn't suck".   
      
   Also useful for things like Huffman and Rice decoding, etc. Say, for   
   Huffman decoding, if one needs to use branches to detect when to pull in   
   more bytes, this eats more clock-cycles than advancing the bit-stream   
   position implicitly via arithmetic tricks.   
      
   Well, and is also an example of why to use LSB first bit ordering, and   
   not to use FF escape encodings and similar:   
   MSB first, FF escapes, and the 16-bit length limit, etc, manage to make   
   JPEG bit-stream handling a lot slower than it could have been.   
      
   Whereas, say, LSB-first and imposing a 12-bit length limit allows some   
   speedup here.   
      
   Though, the Rice coder in UPIC effectively uses an 8-bit lookup, but   
   this is because it uses 3 bits for the Rk factor. So, sadly, it needs a   
   fallback path to decode symbols that exceed 8 bits.   
      
   So, pseudo-code (for AdRice Decoding):   
    win=*(u32 *)cs;   
    b=win>>pos;   
    ix=(rk<<8)|(b&255);   
    v=ricefasttab[ix]; //constant lookup table for Rice-code state space   
    l=(v>>8)&15;   
    if(l<=8)   
    {   
    //faster path   
    pos+=l;   
    cs+=pos>>3;   
    pos&=7;   
    rk=(v>>12);   
    return(v&255);   
    }   
    // ... slower path ...   
    q=riceqtab[b&255]; //count bits for Q prefix.   
    if(q==8)   
    {   
    //escape case, Q==8 escapes a raw max-length symbol   
    l=16;   
    v=(b>>8)&255;   
    rk+=2;   
    if(rk>7)rk=7;   
    }else   
    {   
    l=q+rk+1;   
    v=((b>>(q+1))&((1<
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca