Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,660 of 1,954    |
|    Daniel Oberhoff to talsegal@gmail.com    |
|    Re: Issues regarding testing of a classi    |
|    04 Feb 08 11:55:11    |
      From: danielo@phys.ethz.ch              On 2008-01-29 03:35:32 +0100, talsegal@gmail.com said:              > Hi all,       >       > I have a general question, I hope you guys could help me.       >       > Suppose I have a classifier A that discriminates between two classes:       > class W and B (White balls and Black balls, respectively).       >       > Suppose I have to run the classifier on a vast set of balls (:= P), in       > which the distribution of White and Black balls is unknown (Which       > means I don't know the a-priori probability of getting a white or a       > black ball to examine).       >       > Now I would like to test the classifier. I choose a subset of P (:=N)       > that consists of N balls and run the experiment to get the ROC curve       > of the classifier.       >       > My question is: What is the best way to set the distribution of White       > and Black balls in N if the distribution of P is unknown? 0.5*N Black       > balls and 0.5*N White balls sounds right, but is it really right?! And       > how would the answer change if P can be determined?       >              Hi,              There is a question remaining: how did you get the classifier? Usually       you have a training and a test set. Usually the classifier will,       besides learning what qualifies the different kinds of classes, also       estimate the distribution of the different classes. I.e. if there are       many fewer black than white balls it should learn this and then if in       doubt choose black. And this is usually reasonable. Unless the training       set is artificially biased, in which case measures should be taken to       unbias the classifier.              But since you only talk about testing, not training, you should keep       your training set unbiased, i.e. have black and white balls with equal       probability. Otherwise your classifier could get above random scores       simply by being biased, maybe by chance, towards black or white.              To make a more extensive study you could make a large number of       training sets with different distributions. You can than look at the       score distribution and also at the individual scores to determine if       and how your classifier is biased.              Best              Daniel              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca