... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 1,169 of 1,954

rif to Ted Dunning

Re: Formula for N = number of training s

01 Sep 06 14:55:45

   From: rif@mit.edu   

   "Ted Dunning"  writes:   

   > rif wrote:   
   > > Certainly it depends heavily on the classifier.  I have successfully   
   > > trained linear SVMs with orders of magnitude more features than   
   > > examples.  For instance,   
   > >   
   > > Ramaswamy, Tamayo, Rifkin, Mukherjee, Yeang, Angelo, Ladd, Reich,   
   > > Latulippe, Mesirov, Poggio, Gerlad, Loda, Lander and   
   > > Golub. "Multiclass cancer diagnosis using tumor gene expression   
   > > signatures." Proceedings of the National Academy of Science, vol. 98,   
   > > no. 26, 18 December 2001.   
   > >   
   > > Cheers,   
   > >   
   > > rif   
   > >   
   >   
   > Indeed.  If you have a widely separated classes, this can be true.   
   >   
   > And the bounds on performance given by the VC dimension are much more   
   > informative than any heuristic such as was mentioned.   
   >   
   > This doesn't defeat the curse of dimensionality, sadly.   
   >   

   Well, it's not clear to me that it's about widely separated classes   
   per se.  Certainly widely separated classes is sufficient --- if you   
   give me one dimensions where all the positive points in one class have   
   large values and all the negative points have small values, I believe   
   I can handle exponentially many additional random noise dimensions in   
   the number of data points --- basically, you won't lose until there   
   are a bunch of random dimensions that look just as good as the true   
   discriminator dimension.   

   Also, another nice thought experiment is a case where the dimensions   
   are repeated noisy observations of the same phenemonen.  In this sort   
   of situation, whether your classes are well-separated or not,   
   additional dimensions are actually helpful.   

   I can't say I have a full understanding of all this, as it's quite   
   complex, but I would certainly not expect to find a rule of thumb that   
   told me, knowing nothing about the problem, how much data I needed   
   with respect to the dimensionality.   

   Cheers,   

   rif   

   [ comp.ai is moderated ... your article may take a while to appear. ]   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]