Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,149 of 4,675    |
|    Alex to Terje Mathisen    |
|    Re: More UTF-8 woes - UTF-8 to "\uN" RTF    |
|    04 Dec 17 15:17:33    |
   
   From: alex@nospicedham.rivadpm.com   
      
   On 02-Dec-17 22:01, Terje Mathisen wrote:   
   >   
   > Why don't you decode utf8 chars based on the number of leading 1 bits?   
      
   I didn't invent this and there may be already someone who's posted   
   something along these lines, but you can do this below. The example code   
   counts characters in a null terminated UTF-8 string pointed at by ECX,   
   result in EAX.   
      
    or eax -1   
   @@1: add eax 1 \ add 1   
   @@2: add ecx 1 \ next char   
    movzx edx byte { ecx } \ fetch the byte   
    shl edx 25 \ shift sign bit out to carry   
    js short @@1 \ x1xxxxxx, count   
    jc short @@2 \ 10xxxxxx, don't count   
    jnz short @@1 \ <>0, count   
      
   That is, shift the high order XX........ into the carry & sign bits and   
   use them to branch. Without testing, I have not a clue how performant or   
   otherwise this code might be.   
      
   --   
   Alex   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca