From: ted.dunning@gmail.com   
      
   On Nov 12, 4:49 am, Tim Frink wrote:   
   > ... constructing an appropriate decision tree, I would like   
   > to measure the model's performance.   
      
   You are pretty much on the right track with your leave one out.   
      
   Most people would be happier with 10x1 cross validation, but leave one   
   out done well can be as good.   
      
   What you have to worry about, though, is duplicates or near duplicates   
   in your data set. Those can give you a very unrealistic estimate.   
      
   I have seen this problem, for instance, in news wire classification   
   tasks where many of the documents were small revisions.   
      
   > So far, I've used a leave-one-out cross validation (due to the small   
   > number of examples in the learning set which is about 400) to evaluate   
   > the accuracy (classification error), i.e. how many examples in the test set   
   > were incorrectly predicted.   
      
   You should not only compute estimate performance you should also   
   estimate the error bars on that estimate.   
      
      
   Something that is often not mentioned it that you may have some   
   symmetries or scaling invariance properties in your problem that will   
   allow you to inflate your data set by replicating data points using   
   these invariants. This can dramatically improve your classification   
   process.   
      
   With your small data set, I think you can also benefit from   
   alternative classifiers that are inherently robust against over-   
   training. Take a look at random forests or SVM or Bayesian logistic   
   regression, for instance.   
      
   > I'm not sure if a significance test would provide helpful information.   
   > In my text book, they use the significance test to compare two   
   > different classification algorithm w.r.t. to their absolute error   
   > (they determine by a cross validation).   
      
   You can do this, but I think it is better to just get good estimates   
   of the distribution of your performance estimate. Then you can do all   
   kinds of Monte-Carlo estimates about things like how likely one model   
   is to outperform all others by sampling from the performance   
   estimates. This is generally simpler than getting a non-controversial   
   significance test, especially since you are doing lots of exploratory   
   analysis in a data mining setting. In fact, you can even using the   
   raw cross-validation to get this estimate and that can take into   
   account the correlation of the learning algorithm performance.   
      
   So, in my book, it is critical to take the point of significance tests   
   seriously (you don't quite know the performance you will get on unseen   
   data), but the assumptions of significance tests are all about   
   frequentist sampling arguments and you are inherently violating those   
   assumptions with data mining. Also, interpretation of a significance   
   test can be difficult when you want to take actions such as selecting   
   one model of many. I prefer direct estimates of probabilities like P   
   (model 1 is at least 5% better than model 2). That kind of estimate   
   makes it much easier to explain results and motivate action.   
      
   Good luck. Post a summary fo your results!   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|