Forums before death by AOL, social media and spammers... "We can't have nice things"
|    alt.comp.os.windows-11    |    Steaming pile of horseshit Windows 11    |    4,969 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,639 of 4,969    |
|    Marian to Herbert Kleebauer    |
|    Re: Tutorial: Notepad++ shortcuts.xml ma    |
|    31 Dec 25 11:21:22    |
   
   XPost: alt.comp.os.windows-10, alt.comp.microsoft.windows   
   From: marianjones@helpfulpeople.com   
      
   Herbert Kleebauer wrote:   
   > On 12/31/2025 9:33 AM, Marian wrote:   
   >   
   >> This line has a sneaky Unicode dash � right here.   
   >> This line has curly quotes �like these�.   
   >> This line has a non-breaking space between words.   
   >   
   > In Thunderbird this didn't arrive as valid uTF-8 code.   
   >   
   > "dash � right" in hex:   
   >   
   > 64 61 73 68 │ 20 FB 20 72 │ 69 67 68 74   
   >   
   > FB is the starting byte of a 4 byte utf-8 code, but the   
   > 3 remaining bytes are missing.   
      
   Hi Herbert,   
      
   Happy New Year!   
      
   Thank you for that information. I only find out after I've posted.   
   I don't "see" most of the tofu, but as you can tell, it was there.   
      
   It happens when I don't always remember to convert Unicode to ASCII.   
   Here is the original test file that contains the Unicode characters.   
      
   This line is fine.   
   This line has a sneaky Unicode dash – right here.   
   This line has curly quotes “like these”.   
   This line has a non-breaking space between words.   
      
   Bear in mind there is much more than just Unicode characters in   
   pasted web-page text as Unicode is only the container; the real trouble   
   comes from the variety of characters inside it such as zero-width   
   spaces & joiners, directional control characters, soft hyphens, etc.   
      
   I sent that exactly as it was copied & pasted from gVim.   
   My Usenet "reader" is a bunch of telnet scripts tied to gVim.   
      
   Whatever is in the header is random from a dictionary lookup.   
   So whatever character encoding is in the header is static.   
      
   This is why I try to run all the web page comments (which contain funky   
   characters) through a conversion to ASCII prior to posting.   
      
   Here's that same file after being run through this sequence.   
   c:\> type unicode2ascii.bat   
   @echo off   
   :: unicode2ascii.bat   
   :: This batch file runs a PowerShell script that removes all non-ASCII   
   :: characters from unicode.txt and writes the cleaned output to ascii.txt.   
   powershell -NoProfile -ExecutionPolicy Bypass -File unicode2ascii.ps1   
      
   c:\> type unicode2ascii.ps1   
   # unicode2ascii.ps1   
   # This script reads unicode.txt, removes all characters outside the   
   # 7-bit ASCII range (0x00 to 0x7F), and writes the result to ascii.txt.   
      
   Get-Content unicode.txt | ForEach-Object {   
    ($_ -replace '[^\x00-\x7F]', '')   
   } | Set-Content ascii.txt   
      
   c:\> type ascii.txt   
   This line is fine.   
   This line has a sneaky Unicode dash right here.   
   This line has curly quotes like these.   
   This line has a non-breaking space between words.   
   >   
   >> c:\> type unicode2ascii.bat   
   >> @echo off   
   >>:: unicode2ascii.bat   
   >>:: This batch file runs a PowerShell script that removes all non-ASCII   
   >>:: characters from unicode.txt and writes the cleaned output to ascii.txt.   
   >> powershell -NoProfile -ExecutionPolicy Bypass -File unicode2ascii.ps1   
   >   
   > Wouldn't it be simpler to open the file in Notepad and save it with   
   > ANSI encoding instead of UTF-8?   
      
   My use model is to research the bejeezus out of my Usenet posts, so they   
   very often contain funky characters of all sorts due to copy/paste/edit.   
      
   I simply need a quick converter of the pasted text to keyboard ASCII.   
   This notepad conversion started years ago with a few funky characters.   
   Shortcuts.xml grew over time into the behemoth that it currently is.   
      
   Nonetheless, "simpler" and "faster" need to go together.   
   Currently the process is:   
      
   a. Paste results from multiple web sources into a gvim file   
   b. Convert to ascii in Notepad++ using ctrl+A & ctrl+B   
   c. Edit converted results to the final Usenet post content   
      
   It's just a couple of quick keyboard combinations.   
    ctrl+c (to copy the referenced web page text)   
    Runbox > n (to bring up Notepad)   
    ctrl+v (to paste the copied funky text to Notepad++)   
    ctrl+b (to convert the funky text to ascii characters)   
    ctrl+a (to copy the converted ascii)   
    ctrl+v (to paste into gVim for the final edits)   
    ctrl+s (to send off the Usenet post).   
      
   I do it a hundred times a day, all day, every day, so the keystroke   
   sequence is efficient but I'm always open to a better simpler method.   
      
   The first thing I had tried, years ago, was to do it inside of gVim.   
    :%s/[^[:ascii:]]//g   
   Mapped to the F5 key, that turned into this, which partially works.   
    nnoremap
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca