Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 396 of 1,954    |
|    Eray Ozkural exa to Aleks Jakulin    |
|    Re: what text-classification failures ha    |
|    31 Jul 04 04:35:32    |
      From: erayo@bilkent.edu.tr              Hello Aleks,              First, I think we are supposing that the representation is       bag-of-words. Have you seen promising alternative representations       lately?              Some questions about your examples follow:              "Aleks Jakulin" <"a_jakulin@"@hotmail.com> wrote in message news       <4109fc6b$1@news.unimelb.edu.au>...       > * problems with rare words (it often helps to ignore them)              Do not some frequency reweighting schemes adequately deal with rare       words?              > * problems with imbalanced datasets (which increases the severity of       > 'Siren Pitfall')              Do you think this will be particularly the case for specific metrics       or classification algorithms?              For instance, I think local density estimation (kNN, etc.) would be       likely to fail.              > * problems due to bad samples and concept drift (the training data is       > not representative of the population as a whole)              In general, should this be the job of the classifier (ie. dealing with       missing/wrong attributes), or a feature selection method (ie. carrying       data into a more robust semantic space)?              Regards,              --       Eray Ozkural              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca