home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.compilers      Compiler construction, theory, etc. (Mod      2,753 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 839 of 2,753   
   Daniel Zingaro to you   
   Re: Parsing HTML : I would appreciate ad   
   15 Nov 06 00:10:08   
   
   From: zingard@mcmaster.ca   
      
   Hi,   
      
   A pedegogical XML parser I wrote in Pascal can be found at   
   http://www.cas.mcmaster.ca/~zingard/xmlparser.zip   
      
   HTML can be parsed similarly. ... Of course this is only if you feel   
   like essentially wasting time solving a problem that has been solved   
   over and over before, like John noted =).   
      
   Thanks,   
   Dan   
      
   At 04:31 PM 11/13/2006, you wrote:   
   >The problem to solve.   
   >   
   >I have to parse millions of html documents, and return just the   
   >plaintext/bytes. Many of the html documents contain Japanese   
   >characters and so it will be necessary to read the codepage in the   
   >html header, so the bytes can be read properly. ...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca