home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,660 of 1,954   
   Daniel Oberhoff to talsegal@gmail.com   
   Re: Issues regarding testing of a classi   
   04 Feb 08 11:55:11   
   
   From: danielo@phys.ethz.ch   
      
   On 2008-01-29 03:35:32 +0100, talsegal@gmail.com said:   
      
   > Hi all,   
   >   
   > I have a general question, I hope you guys could help me.   
   >   
   > Suppose I have a classifier A that discriminates between two classes:   
   > class W and B (White balls and Black balls, respectively).   
   >   
   > Suppose I have to run the classifier on a vast set of balls (:= P), in   
   > which the distribution of White and Black balls is unknown (Which   
   > means I don't know the a-priori probability of getting a white or a   
   > black ball to examine).   
   >   
   > Now I would like to test the classifier. I choose a subset of P (:=N)   
   > that consists of N balls and run the experiment to get the ROC curve   
   > of the classifier.   
   >   
   > My question is: What is the best way to set the distribution of White   
   > and Black balls in N if the distribution of P is unknown? 0.5*N Black   
   > balls and 0.5*N White balls sounds right, but is it really right?! And   
   > how would the answer change if P can be determined?   
   >   
      
   Hi,   
      
   There is a question remaining: how did you get the classifier? Usually   
   you have a training and a test set. Usually the classifier will,   
   besides learning what qualifies the different kinds of classes, also   
   estimate the distribution of the different classes. I.e. if there are   
   many fewer black than white balls it should learn this and then if in   
   doubt choose black. And this is usually reasonable. Unless the training   
   set is artificially biased, in which case measures should be taken to   
   unbias the classifier.   
      
   But since you only talk about testing, not training, you should keep   
   your training set unbiased, i.e. have black and white balls with equal   
   probability. Otherwise your classifier could get above random scores   
   simply by being biased, maybe by chance, towards black or white.   
      
   To make a more extensive study you could make a large number of   
   training sets with different distributions. You can than look at the   
   score distribution and also at the individual scores to determine if   
   and how your classifier is biased.   
      
   Best   
      
   Daniel   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca