... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 396 of 1,954

Eray Ozkural exa to Aleks Jakulin

Re: what text-classification failures ha

31 Jul 04 04:35:32

   From: erayo@bilkent.edu.tr   
      
   Hello Aleks,   
      
   First, I think we are supposing that the representation is   
   bag-of-words. Have you seen promising alternative representations   
   lately?   
      
   Some questions about your examples follow:   
      
   "Aleks Jakulin" <"a_jakulin@"@hotmail.com> wrote in message news   
   <4109fc6b$1@news.unimelb.edu.au>...   
   > * problems with rare words (it often helps to ignore them)   
      
   Do not some frequency reweighting schemes adequately deal with rare   
   words?   
      
   > * problems with imbalanced datasets (which increases the severity of   
   > 'Siren Pitfall')   
      
   Do you think this will be particularly the case for specific metrics   
   or classification algorithms?   
      
   For instance, I think local density estimation (kNN, etc.) would be   
   likely to fail.   
      
   > * problems due to bad samples and concept drift (the training data is   
   > not representative of the population as a whole)   
      
   In general, should this be the job of the classifier (ie. dealing with   
   missing/wrong attributes), or a feature selection method (ie. carrying   
   data into a more robust semantic space)?   
      
   Regards,   
      
   --   
   Eray Ozkural   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]