... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.asm.x86

Ahh, the lost art of x86 assembly

4,675 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 3,149 of 4,675

Alex to Terje Mathisen

Re: More UTF-8 woes - UTF-8 to "\uN" RTF

04 Dec 17 15:17:33

   From: alex@nospicedham.rivadpm.com   

   On 02-Dec-17 22:01, Terje Mathisen wrote:   
   >   
   > Why don't you decode utf8 chars based on the number of leading 1 bits?   

   I didn't invent this and there may be already someone who's posted   
   something along these lines, but you can do this below. The example code   
   counts characters in a null terminated UTF-8 string pointed at by ECX,   
   result in EAX.   

           or      eax -1   
   @@1:   add     eax 1                \ add 1   
   @@2:   add     ecx 1                \ next char   
           movzx   edx byte { ecx }     \ fetch the byte   
           shl     edx 25               \ shift sign bit out to carry   
           js      short @@1               \ x1xxxxxx, count   
           jc      short @@2               \ 10xxxxxx, don't count   
           jnz     short @@1               \ <>0, count   

   That is, shift the high order XX........ into the carry & sign bits and   
   use them to branch. Without testing, I have not a clue how performant or   
   otherwise this code might be.   

   --   
   Alex   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]