Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.compilers    |    Compiler construction, theory, etc. (Mod    |    2,753 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 839 of 2,753    |
|    Daniel Zingaro to you    |
|    Re: Parsing HTML : I would appreciate ad    |
|    15 Nov 06 00:10:08    |
      From: zingard@mcmaster.ca              Hi,              A pedegogical XML parser I wrote in Pascal can be found at       http://www.cas.mcmaster.ca/~zingard/xmlparser.zip              HTML can be parsed similarly. ... Of course this is only if you feel       like essentially wasting time solving a problem that has been solved       over and over before, like John noted =).              Thanks,       Dan              At 04:31 PM 11/13/2006, you wrote:       >The problem to solve.       >       >I have to parse millions of html documents, and return just the       >plaintext/bytes. Many of the html documents contain Japanese       >characters and so it will be necessary to read the codepage in the       >html header, so the bytes can be read properly. ...              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca