home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   alt.comp.os.windows-11      Steaming pile of horseshit Windows 11      4,852 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 4,588 of 4,852   
   Maria Sophia to Paul   
   Re: PSA: HTML fragment mode interaction    
   10 Feb 26 11:27:16   
   
   XPost: alt.comp.os.windows-10, alt.comp.microsoft.windows   
   From: mariasophia@comprehension.com   
      
   Paul wrote:   
   > There are various "laundering recipes" for fixing issue like that.   
   > Presumably this Notepad++ behavior has already been noted (somewhere).   
   > It would be unusual for "bad manners" to go unacknowledged.   
      
   I wrote a Notepad++ Macro that "launders" text which is affected since I   
   needed to copy/paste the text after the macro launders it to pure ASCII.   
      
   Apparently when Chromium apps copy text, they don't just put CF_TEXT and   
   CF_UNICODETEXT on the clipboard. They also include:   
   a. CF_HTML - a full HTML fragment with metadata   
   b. HTML Format - a Microsoft-defined clipboard format that includes:   
   c. StartFragment / EndFragment markers   
   d. Optional StartHTML / EndHTML offsets   
   e. And a hidden boundary indicating an HTML document   
      
   Notepad++ doesn't render HTML, but it does detect the presence of CF_HTML   
   and switches into a special internal mode intended for HTML-aware pasting.   
      
   It's invisible because Notepad++ doesn't expose it in the UI, so my simple   
   brain said "if it's invisible, it's not there" and yet, it was there.   
      
   In that mode, Notepad++ treats the invisible HTML fragment boundary as if   
   it were a zero-width first line. So three things caused me issues.   
   1. Ctrl+A flashes but selects nothing   
   2. Ctrl+X does nothing   
   3. The caret behaves strangely at the start of the document   
   Because Notepad++ is obeying the HTML fragment metadata.   
      
   CF_HTML is not inserted into the document, which is why it never showed up   
   in the hex editor. It exists only on the clipboard, not in the pasted text.   
      
   Manually inserting a blank line forces Notepad++ to reinterpret the buffer   
   as plain text. The invisible HTML fragment boundary is still there. But the   
   invisible HTML fragment is no longer at the top of the buffer. Then,   
   deleting the blank line is forces a reparse of the entire document.   
      
   What threw me off the trail was that Notepad++'s internal state changes   
   based on the clipboard format, not based on the pasted bytes. So the text   
   looked perfectly normal in Notepad++'s hex editor.   
      
   I should re-try Andy's suggestion of Edit > Paste Special > Paste as ANSI   
   Or use a plugin like "Paste as Plain Text" or use a clipboard cleaner such   
   as PureText.   
      
   For me, the issue was incredibly confusing because:   
   a. The pasted text looks normal   
   b. It doesn't happen with Firefox   
   c. The broken behavior appears random   
   d. There is no UI indicator that Notepad++ is in HTML fragment mode   
   Because not only does it only happen when the clipboard contains CF_HTML   
   but it persists until the buffer is modified in a way that forces a reset.   
      
   Now that I know what I know, I can finally modify the shortcuts.xml macro   
   so that it will consistently convert characters copied from web sites.   
   --   
   The nice thing about Usenet is you get good ideas from everyone.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca