Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 322 of 1,954    |
|    David Tian to All    |
|    Changing Regressiong problems to Classif    |
|    11 May 04 20:29:03    |
      From: yuanxi80@hotmail.com              Hi,              In my research, I need to change a regression problem into a       classification problem by discretizating all feature attributes and       also the "decision attribute". Discretization involves mapping       continuous values into discrete domains. At the moment, I have appied       two discretization techniques. 1) the equal width discretization:       dividing the continuous variable domains into equal width discrete       intervals using intuition and 2) using k-means clustering to find the       boundaries of intervals for the variables.              The purpose of changing regression to classification using       discretization in my research is for feature selection at the next       stage; I use Rough Set Attribute Reduction (RSAR) for feature       selection and RSAR only works on discrete dataset. So after feature       selection, the original dataset can be reduced using the selected       features and a linear regression model built from the selected       features. This would give a more accurate model in terms of prediction       accuracy than the full model building using all features. The overall       process is as follows:              Original real data (regression problem) --> Discretization -->       discretized data (classification problem) ---> Feature selection (RSAR)       ---> selected features ---> reduced real original data ---> linear       regression model              At moment, the linear regression model built using selected features       is poor, its root-mean-squared error is big. This implies that the       selected features are not representative (significant) of the all       features of dataset. This also implies that the discretization stage       is poor; too much information about the frequency of occurrance of       values for each feature might have been lost.              Is there a measure showing the goodness of discretization in terms of       loss of information from original data? There are quite a few       different discretization methods. But most of them are for       discretizing the feature attributes only, but not the decision       attribute. Therefore, what are the main (most popular) discretization       methods for changing regression problems to classification problems,       if there is any? How do I decide which method is most suitable for my       problem?              Thanks alot and regards,              David              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca