home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.compilers      Compiler construction, theory, etc. (Mod      2,753 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 843 of 2,753   
   Juergen Kahrs to Jim   
   Re: Parsing HTML : I would appreciate ad   
   15 Nov 06 00:11:55   
   
   From: Juergen.KahrsDELETETHIS@vr-web.de   
      
   Jim wrote:   
      
   > I have to parse millions of html documents, and return just the   
   > plaintext/bytes. Many of the html documents contain Japanese   
   > characters and so it will be necessary to read the codepage in the   
   > html header, so the bytes can be read properly.   
      
   Use "lynx -dump". w3m can also do this.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca