home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 80 of 1,954   
   Pascal Bourguignon to mike79@iprimus.com.au   
   Re: Naive Bayes...   
   26 Sep 03 00:41:52   
   
   From: spam@thalassa.informatimago.com   
      
   mike79@iprimus.com.au (mike79) writes:   
      
   > Hi all,   
   >   
   > I've done quite a bit of research on this topic "Naive Bayes Text   
   > Classifier" and want to ask whether the following formulas that I am   
   > using are correct or not. Oh, by the way, I am implementing a Spam   
   > filter.   
   >   
   > Read the spam corpus, and map each word to the number of occurrences,   
   > and place this in a table.   
   >   
   > Do the same for the ham corpus, and place it in another table.   
   >   
   > Then, create a third table, this is the probability table. Take a word   
   > from the spam table and calculate the ratio of the number of   
   > occurrences in the spam table to the number of occurences in the (spam   
   > + ham) table.   
   >   
   > For example, just say the word "madam" occurred 10 times in the spam   
   > corpus, and only 5 times in the ham corpus, the corresponding   
   > probability for the word madam will be 10/(10+5) = 10/15 = 0.67 i.e.   
   > 67% probability that the word is coming from a spam email.   
   >   
   > Is this the correct way of doing it, or am I doing something wrong?   
      
   That sounds perfectly correct.   
   Did you read  http://www.paulgraham.com/paulgraham/naivebayes.html   
      
   --   
   __Pascal_Bourguignon__   
   http://www.informatimago.com/   
   Do not adjust your mind, there is a fault in reality.   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca