XPost: comp.ai.neural-nets   
   From: rif@mit.edu   
      
   "Kimon Spiliopoulos" writes:   
      
   > > I don't agree that SMO is especially fast. SMO is an excellent   
   > > algorithm because it's easy to code up if you don't have a general   
   > > purpose QP solver. Having done a fair bit of experimentation and   
   > > comparison, I don't feel that it's in general faster than using a   
   > > state-of-the-art QP solver to solve moderate size (say a couple   
   > > hundred) subproblems at each iteration. I'm not sure what would   
   > > happen if you tried to solve for 3 or 4 examples at once.   
   > >   
   >   
   > That's surprising for me because that is exactly the size of   
   > sub-problem I thought SMO is better. Which QP solver have you used?   
   > Surely you don't mean a general purpose one.   
   >   
   > Kimon   
      
   Unfortunately, the situation is quite complex. As part of my Phd   
   work, I wrote an SVM solver (SvmFu, no longer supported or in wide use   
   or recommended). SvmFu worked by using SMO to solve subproblems over   
   a chunk (a few hundred to a few thousand examples, user settable).   
   Around the same time, Joachims' SvmLight was using a general purpose   
   QP to solve subproblems. On some examples, SvmFu was faster, on some   
   examples, SvmLight was faster, and I could not easily tell which would   
   be faster on a given problem without trying. Both were *much* faster   
   than the SMO algorithm described in Platt's 1998 paper, which did not   
   mention caching of kernel products; caching of frequently needed   
   kernel products is one of the primary keys to solving SVMs rapidly.   
      
   In general, if I can easily store and work with O(n^2) memory, I   
   expect to be able to solve a subproblem in time O(n^3) using a general   
   purpose QP. In SMO, the individual steps are very rapid, but I will   
   need to take many more steps, even in the best case. If I don't do   
   some sort of chunking, it may take me a long time just to figure out   
   what step to take next. It's unclear where the balance was. I   
   suspect that around a few hundred points, you will do about equally   
   well solving the whole subproblem using a general purpose QP and using   
   SMO. In particular, if the dimensionality is quite high, you will   
   spend a lot of the total time just calculating the kernel matrix, and   
   assuming your subproblem is mostly support vectors, you will need that   
   entire kernel matrix, so SMO and general purpose will both pay this   
   cost.   
      
   I eventally discontinued this line of work, on the grounds that I   
   could spend a lot of time digging at the algorithms, and I might get   
   another 30% performance improvement, but I certainly wasn't going to   
   get a factor of 10. The modern implementations of SvmLight usually   
   give answers in only a small constant fraction of the time required to   
   check correctness of the answer via the KKT conditions, and this upper   
   bounds the performance improvements left on the tree.   
      
   IIRC, a lot of the more recent work (such as Keerthi's excellent work)   
   isn't actually solving exactly the same problem as the classic SVM.   
   For instance, I think Keerthi often considers a squared hinge loss   
   rather than the original hinge loss. I'm not making any claims that   
   this is better or worse for generalization, just notin that it's   
   different.   
      
   Cheers,   
      
   rif   
      
   [ comp.ai is moderated. To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|