... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 1,659 of 1,954

Ted Dunning to talse...@gmail.com

Re: Issues regarding testing of a classi

03 Feb 08 08:59:02

   From: ted.dunning@gmail.com   
      
   On Jan 28, 6:35 pm, talse...@gmail.com wrote:   
   > I have a general question, I hope you guys could help me.   
   >   
   > Suppose I have a classifier A that discriminates between two classes:   
   > class W and B (White balls and Black balls, respectively).   
   >   
   > Suppose I have to run the classifier on a vast set of balls (:= P), in   
   > which the distribution of White and Black balls is unknown (Which   
   > means I don't know the a-priori probability of getting a white or a   
   > black ball to examine).   
   >   
   > Now I would like to test the classifier. I choose a subset of P (:=N)   
   > that consists of N balls and run the experiment to get the ROC curve   
   > of the classifier.   
   >   
   > My question is: What is the best way to set the distribution of White   
   > and Black balls in N if the distribution of P is unknown? 0.5*N Black   
   > balls and 0.5*N White balls sounds right, but is it really right?! And   
   > how would the answer change if P can be determined?   
   >   
      
   You don't say whether or not you also have an oracle that can   
   determine whether a ball is actually black or white.   
      
   Generally when testing a classifier, it is best to know what the   
   correct answer is.   
      
   If you are only asking how to set the prior on probability of white,   
   then usually no knowledge of a parameter in [0,1] is represented not   
   as a point estimate, but as a uniform distribution.  You should,   
   strictly speaking then integrate this parameter away to get the   
   distribution of answer.  An alternative approach is to maximize the   
   posterior probability which is particularly easy in the case of a   
   uniform prior since the prior goes away.   
      
   Also, if your classifier is framed as computing the posterior   
   distribution on a mixture distribution, then taking p(white) = 0.5 is   
   the same as using a uniform prior distribution.   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]