Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,621 of 1,954    |
|    BerlinBrown to All    |
|    Web document categorization strategies?    |
|    05 Jan 08 02:59:33    |
      From: berlin.brown@gmail.com              Are there any simplified or established approaches for categorizing       web documents. For example, lets say I have 100 million URLs and I       can extract the document, description.              Are there any simplified approaches for categorizing the data?              As of now, I am focusing on baysian methods. SOmething along the       lines of this:              http://gnosis.cx/publish/programming/filtering-spam.html              Only problems is that you need large sets of spam and ham to group a       set into a category.              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca