home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,149 of 4,675   
   Alex to Terje Mathisen   
   Re: More UTF-8 woes - UTF-8 to "\uN" RTF   
   04 Dec 17 15:17:33   
   
   From: alex@nospicedham.rivadpm.com   
      
   On 02-Dec-17 22:01, Terje Mathisen wrote:   
   >   
   > Why don't you decode utf8 chars based on the number of leading 1 bits?   
      
   I didn't invent this and there may be already someone who's posted   
   something along these lines, but you can do this below. The example code   
   counts characters in a null terminated UTF-8 string pointed at by ECX,   
   result in EAX.   
      
           or      eax -1   
   @@1:   add     eax 1                \ add 1   
   @@2:   add     ecx 1                \ next char   
           movzx   edx byte { ecx }     \ fetch the byte   
           shl     edx 25               \ shift sign bit out to carry   
           js      short @@1               \ x1xxxxxx, count   
           jc      short @@2               \ 10xxxxxx, don't count   
           jnz     short @@1               \ <>0, count   
      
   That is, shift the high order XX........ into the carry & sign bits and   
   use them to branch. Without testing, I have not a clue how performant or   
   otherwise this code might be.   
      
   --   
   Alex   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca