From: rif@mit.edu   
      
   "Ted Dunning" writes:   
      
   > rif wrote:   
   > > Certainly it depends heavily on the classifier. I have successfully   
   > > trained linear SVMs with orders of magnitude more features than   
   > > examples. For instance,   
   > >   
   > > Ramaswamy, Tamayo, Rifkin, Mukherjee, Yeang, Angelo, Ladd, Reich,   
   > > Latulippe, Mesirov, Poggio, Gerlad, Loda, Lander and   
   > > Golub. "Multiclass cancer diagnosis using tumor gene expression   
   > > signatures." Proceedings of the National Academy of Science, vol. 98,   
   > > no. 26, 18 December 2001.   
   > >   
   > > Cheers,   
   > >   
   > > rif   
   > >   
   >   
   > Indeed. If you have a widely separated classes, this can be true.   
   >   
   > And the bounds on performance given by the VC dimension are much more   
   > informative than any heuristic such as was mentioned.   
   >   
   > This doesn't defeat the curse of dimensionality, sadly.   
   >   
      
   Well, it's not clear to me that it's about widely separated classes   
   per se. Certainly widely separated classes is sufficient --- if you   
   give me one dimensions where all the positive points in one class have   
   large values and all the negative points have small values, I believe   
   I can handle exponentially many additional random noise dimensions in   
   the number of data points --- basically, you won't lose until there   
   are a bunch of random dimensions that look just as good as the true   
   discriminator dimension.   
      
   Also, another nice thought experiment is a case where the dimensions   
   are repeated noisy observations of the same phenemonen. In this sort   
   of situation, whether your classes are well-separated or not,   
   additional dimensions are actually helpful.   
      
   I can't say I have a full understanding of all this, as it's quite   
   complex, but I would certainly not expect to find a rule of thumb that   
   told me, knowing nothing about the problem, how much data I needed   
   with respect to the dimensionality.   
      
   Cheers,   
      
   rif   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|