home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 322 of 1,954   
   David Tian to All   
   Changing Regressiong problems to Classif   
   11 May 04 20:29:03   
   
   From: yuanxi80@hotmail.com   
      
   Hi,   
      
   In my research, I need to change a regression problem into a   
   classification problem by discretizating all feature attributes and   
   also the "decision attribute". Discretization involves mapping   
   continuous values into discrete domains. At the moment, I have appied   
   two discretization techniques. 1) the equal width discretization:   
   dividing the continuous variable domains into equal width discrete   
   intervals using intuition and 2) using k-means clustering to find the   
   boundaries of intervals for the variables.   
      
   The purpose of changing regression to classification using   
   discretization in my research is for feature selection at the next   
   stage; I use Rough Set Attribute Reduction (RSAR) for feature   
   selection and RSAR only works on discrete dataset. So after feature   
   selection, the original dataset can be reduced using the selected   
   features and a linear regression model built from the selected   
   features. This would give a more accurate model in terms of prediction   
   accuracy than the full model building using all features. The overall   
   process is as follows:   
      
   Original real data (regression problem) --> Discretization -->   
   discretized data (classification problem) ---> Feature selection (RSAR)   
   ---> selected features ---> reduced real original data ---> linear   
   regression model   
      
   At moment, the linear regression model built using selected features   
   is poor, its root-mean-squared error is big. This implies that the   
   selected features are not representative (significant) of the all   
   features of dataset. This also implies that the discretization stage   
   is poor; too much information about the frequency of occurrance of   
   values for each feature might have been lost.   
      
   Is there a measure showing the goodness of discretization in terms of   
   loss of information from original data? There are quite a few   
   different discretization methods. But most of them are for   
   discretizing the feature attributes only, but not the decision   
   attribute. Therefore, what are the main (most popular) discretization   
   methods for changing regression problems to classification problems,   
   if there is any? How do I decide which method is most suitable for my   
   problem?   
      
   Thanks alot and regards,   
      
   David   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca