Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 695 of 1,954    |
|    Ted Dunning to All    |
|    Re: Decision Trees    |
|    08 Apr 05 19:13:53    |
   
   From: ted.dunning@gmail.com   
      
   The answer is (as stated twice already) yes, decision trees can solve   
   some linearly inseparable problems. But the trade-off is that a finite   
   decision tree has trouble solving some very simple linearly separable   
   problems.   
      
   First, decision trees that use only a single (or even only a small   
   finite number) number of input variables have a difficult time with   
   some linearly separable problems. Take, for example C1: {x,y | x+y <   
   1}. If the decision tree is composed only of statements like   
      
    if x_i greaterOrLessThan k then return C   
      
   (which is a common constraint) then for any oblique linear   
   classification the decision tree will have to cover the half plain with   
   lots of quadrants, the union of which is the desired region. If, on   
   the other hand, rules like   
      
    if f(x_1 ... x_n) greaterOrLessThan k then return C   
      
   are allowed where f is any function, then obviously the decision tree   
   can perform any deterministic classification, but it becomes   
   essentially impossible to learn from data and thus irrelevant.   
      
   Some decision tree systems learn oblique classification rules, but   
   these systems are not all that widely used, nor are they necessarily   
   such a good idea seeing as how they are neither fish nor fowl; they   
   have the difficulties of neural nets without the advantages. What   
   normal decision trees are particularly good at is learning decision   
   surfaces that are largely parallel to the feature axes. What linear   
   classifier techniques are good at is learning linearly separable   
   surfaces in any orientation whatsoever. Depending on your problem,   
   you might be able to project into an interesting higher dimensional   
   space to use a linear classifier, or you might have something that is   
   ideal for a decision tree. I recommend that you always try both.   
      
   With a suitable prior that shows a preference to axis parallel surfaces   
   multi-level perceptrons can act very much like decision trees. This   
   sort of prior is often implemented using Cauchy mutation operation or a   
   regularizer based on rank statistics.   
      
   It is useful to note how the question itself and the illustrative   
   examples tend to frame the discussion one way or the other. The term   
   "linearly separable" and the examples like C1 = {x, y | x + y < 1}   
   inherently assume a particular kind of problem and tend to show up the   
   disadvantages. Similarly examples which are essentially defined as a   
   decision tree will tend to be easier to learn using decision trees.   
      
   In practice, I have found that an interesting option is to use a   
   decision tree on a problem and then let something like a logistic   
   regression classifier cheat by getting to see some outputs of the rules   
   used by the decision tree system. You can then turn the tables and let   
   the decision tree system see the output of the logistic regression.   
   You know you are done with this process when the decision tree system   
   ignores everything except your logistic regression output.   
      
   This methodology is very powerful, but is really only useful where you   
   have more data than you know what to do with. In a data poor   
   environment, such a method is nearly certain to cause massive   
   over-fitting.   
      
   [ comp.ai is moderated. To submit, just post and be patient, or if ]   
   [ that fails mail your article to
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca