Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,955 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,662 of 1,955    |
|    Milind Joshi to talse...@gmail.com    |
|    Re: Issues regarding testing of a classi    |
|    05 Feb 08 11:33:08    |
      From: milind.a.joshi@gmail.com              On Jan 28, 9:35 pm, talse...@gmail.com wrote:       > Hi all,       >       > I have a general question, I hope you guys could help me.       >       > Suppose I have a classifier A that discriminates between two classes:       > class W and B (White balls and Black balls, respectively).       >       > Suppose I have to run the classifier on a vast set of balls (:= P), in       > which the distribution of White and Black balls is unknown (Which       > means I don't know the a-priori probability of getting a white or a       > black ball to examine).       >       > Now I would like to test the classifier. I choose a subset of P (:=N)       > that consists of N balls and run the experiment to get the ROC curve       > of the classifier.       >       > My question is: What is the best way to set the distribution of White       > and Black balls in N if the distribution of P is unknown? 0.5*N Black       > balls and 0.5*N White balls sounds right, but is it really right?! And       > how would the answer change if P can be determined?       >              Hi,              I would say the use of the word "testing" for your scenario is       probably not the right one... testing is against a known outcome or       known probability.       What you are trying to do is estimate the probability that your       classifier accurately reflects the real life distribution without       knowing the real life distribution.              In essence, you are trying to guess. Could work, if you know the       domain well.              The techniques like sampling a portion of your domain to see what the       probability is, is a good idea.              In this case, what could help (if known) is:              1. The probability that your large sample is distributed as you       imagine it to be              2. The probability that your classifier works well with the class of       problem you described              3. The probability that you don't place a very high confidence on the       accuracy of your guess.              Best Regards,       Milind              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca