... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 722 of 1,954

Ted Dunning to All

Re: How can I analyse the similarity mea

14 May 05 05:58:30

   From: ted.dunning@gmail.com   
      
   Depending on what you are doing, Pearson's correlation is likely to be   
   a *really* bad measure of similarity because it is subject to very bad   
   behavior when you are looking at small numbers of examples.   
      
   Better in many cases to use measures of anomalous association such as   
   G^2 (I recommended this in my 1993 paper in computational linguistics)   
   or Fisher's exact test (search for Ted Pedersen's work).   
      
   Much better than that, however, is to really analyze what you are   
   trying to do and put a solid probabilistic model underneath it.   If   
   you do that, you can know how reliable your inferences are and avoid   
   the problems with small counts.  See the work of David Mackay for   
   examples of the maximum evidence method.  David Heckerman has a very   
   nice tutorial on Bayesian networks as well.   
      
   How about you say what you are trying to do?   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]