home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,621 of 1,954   
   BerlinBrown to All   
   Web document categorization strategies?   
   05 Jan 08 02:59:33   
   
   From: berlin.brown@gmail.com   
      
   Are there any simplified or established approaches for categorizing   
   web documents.  For example, lets say I have 100 million URLs and I   
   can extract the document, description.   
      
   Are there any simplified approaches for categorizing the data?   
      
   As of now, I am focusing on baysian methods.  SOmething along the   
   lines of this:   
      
   http://gnosis.cx/publish/programming/filtering-spam.html   
      
   Only problems is that you need large sets of spam and ham to group a   
   set into a category.   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca