Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 80 of 1,954    |
|    Pascal Bourguignon to mike79@iprimus.com.au    |
|    Re: Naive Bayes...    |
|    26 Sep 03 00:41:52    |
      From: spam@thalassa.informatimago.com              mike79@iprimus.com.au (mike79) writes:              > Hi all,       >       > I've done quite a bit of research on this topic "Naive Bayes Text       > Classifier" and want to ask whether the following formulas that I am       > using are correct or not. Oh, by the way, I am implementing a Spam       > filter.       >       > Read the spam corpus, and map each word to the number of occurrences,       > and place this in a table.       >       > Do the same for the ham corpus, and place it in another table.       >       > Then, create a third table, this is the probability table. Take a word       > from the spam table and calculate the ratio of the number of       > occurrences in the spam table to the number of occurences in the (spam       > + ham) table.       >       > For example, just say the word "madam" occurred 10 times in the spam       > corpus, and only 5 times in the ham corpus, the corresponding       > probability for the word madam will be 10/(10+5) = 10/15 = 0.67 i.e.       > 67% probability that the word is coming from a spam email.       >       > Is this the correct way of doing it, or am I doing something wrong?              That sounds perfectly correct.       Did you read http://www.paulgraham.com/paulgraham/naivebayes.html              --       __Pascal_Bourguignon__       http://www.informatimago.com/       Do not adjust your mind, there is a fault in reality.              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca