home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,955 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,662 of 1,955   
   Milind Joshi to talse...@gmail.com   
   Re: Issues regarding testing of a classi   
   05 Feb 08 11:33:08   
   
   From: milind.a.joshi@gmail.com   
      
   On Jan 28, 9:35 pm, talse...@gmail.com wrote:   
   > Hi all,   
   >   
   > I have a general question, I hope you guys could help me.   
   >   
   > Suppose I have a classifier A that discriminates between two classes:   
   > class W and B (White balls and Black balls, respectively).   
   >   
   > Suppose I have to run the classifier on a vast set of balls (:= P), in   
   > which the distribution of White and Black balls is unknown (Which   
   > means I don't know the a-priori probability of getting a white or a   
   > black ball to examine).   
   >   
   > Now I would like to test the classifier. I choose a subset of P (:=N)   
   > that consists of N balls and run the experiment to get the ROC curve   
   > of the classifier.   
   >   
   > My question is: What is the best way to set the distribution of White   
   > and Black balls in N if the distribution of P is unknown? 0.5*N Black   
   > balls and 0.5*N White balls sounds right, but is it really right?! And   
   > how would the answer change if P can be determined?   
   >   
      
   Hi,   
      
   I would say the use of the word "testing" for your scenario is   
   probably not the right one... testing is against a known outcome or   
   known probability.   
   What you are trying to do is estimate the probability that your   
   classifier accurately reflects the real life distribution without   
   knowing the real life distribution.   
      
   In essence, you are trying to guess. Could work, if you know the   
   domain well.   
      
   The techniques like sampling a portion of your domain to see what the   
   probability is, is a good idea.   
      
   In this case, what could help (if known) is:   
      
   1. The probability that your large sample is distributed as you   
   imagine it to be   
      
   2. The probability that your classifier works well with the class of   
   problem you described   
      
   3. The probability that you don't place a very high confidence on the   
   accuracy of your guess.   
      
   Best Regards,   
   Milind   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca