Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,659 of 1,954    |
|    Ted Dunning to talse...@gmail.com    |
|    Re: Issues regarding testing of a classi    |
|    03 Feb 08 08:59:02    |
      From: ted.dunning@gmail.com              On Jan 28, 6:35 pm, talse...@gmail.com wrote:       > I have a general question, I hope you guys could help me.       >       > Suppose I have a classifier A that discriminates between two classes:       > class W and B (White balls and Black balls, respectively).       >       > Suppose I have to run the classifier on a vast set of balls (:= P), in       > which the distribution of White and Black balls is unknown (Which       > means I don't know the a-priori probability of getting a white or a       > black ball to examine).       >       > Now I would like to test the classifier. I choose a subset of P (:=N)       > that consists of N balls and run the experiment to get the ROC curve       > of the classifier.       >       > My question is: What is the best way to set the distribution of White       > and Black balls in N if the distribution of P is unknown? 0.5*N Black       > balls and 0.5*N White balls sounds right, but is it really right?! And       > how would the answer change if P can be determined?       >              You don't say whether or not you also have an oracle that can       determine whether a ball is actually black or white.              Generally when testing a classifier, it is best to know what the       correct answer is.              If you are only asking how to set the prior on probability of white,       then usually no knowledge of a parameter in [0,1] is represented not       as a point estimate, but as a uniform distribution. You should,       strictly speaking then integrate this parameter away to get the       distribution of answer. An alternative approach is to maximize the       posterior probability which is particularly easy in the case of a       uniform prior since the prior goes away.              Also, if your classifier is framed as computing the posterior       distribution on a mixture distribution, then taking p(white) = 0.5 is       the same as using a uniform prior distribution.              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca