home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,169 of 1,954   
   rif to Ted Dunning   
   Re: Formula for N = number of training s   
   01 Sep 06 14:55:45   
   
   From: rif@mit.edu   
      
   "Ted Dunning"  writes:   
      
   > rif wrote:   
   > > Certainly it depends heavily on the classifier.  I have successfully   
   > > trained linear SVMs with orders of magnitude more features than   
   > > examples.  For instance,   
   > >   
   > > Ramaswamy, Tamayo, Rifkin, Mukherjee, Yeang, Angelo, Ladd, Reich,   
   > > Latulippe, Mesirov, Poggio, Gerlad, Loda, Lander and   
   > > Golub. "Multiclass cancer diagnosis using tumor gene expression   
   > > signatures." Proceedings of the National Academy of Science, vol. 98,   
   > > no. 26, 18 December 2001.   
   > >   
   > > Cheers,   
   > >   
   > > rif   
   > >   
   >   
   > Indeed.  If you have a widely separated classes, this can be true.   
   >   
   > And the bounds on performance given by the VC dimension are much more   
   > informative than any heuristic such as was mentioned.   
   >   
   > This doesn't defeat the curse of dimensionality, sadly.   
   >   
      
   Well, it's not clear to me that it's about widely separated classes   
   per se.  Certainly widely separated classes is sufficient --- if you   
   give me one dimensions where all the positive points in one class have   
   large values and all the negative points have small values, I believe   
   I can handle exponentially many additional random noise dimensions in   
   the number of data points --- basically, you won't lose until there   
   are a bunch of random dimensions that look just as good as the true   
   discriminator dimension.   
      
   Also, another nice thought experiment is a case where the dimensions   
   are repeated noisy observations of the same phenemonen.  In this sort   
   of situation, whether your classes are well-separated or not,   
   additional dimensions are actually helpful.   
      
   I can't say I have a full understanding of all this, as it's quite   
   complex, but I would certainly not expect to find a rule of thumb that   
   told me, knowing nothing about the problem, how much data I needed   
   with respect to the dimensionality.   
      
   Cheers,   
      
   rif   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca