... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 561 of 1,954

Ted Dunning to All

Re: locally weighted regression

17 Jan 05 01:57:03

   From: tdunning@san.rr.com   
      
   Actually, without more information, it is impossible to say what your   
   results mean.   
      
   You need to give just a bit more information such as how many data   
   points you have and whether you can obtain more data to test any models   
   that you create.   
      
   Here are a few scenarios:   
      
   a) you have tens of thousands of data points or more and can get more   
   any time you like.  This occurs often in signal processing   
   applications.  In such a situation, it seems likely that your quadratic   
   fit results really does mean something.  To test this without   
   mathematics, look at the residuals on the training data and then look   
   at the residuals on data that you didn't use in the regression or in   
   the selection of regression models.  If the  average magnitude of the   
   residuals is about the same in both cases (or better yet, the   
   distribution is similar), then you probably have something.   
      
   b) you have hundreds of data points and getting more is difficult or   
   impossible.  Here things become murkier.  You should institute a strict   
   discipline of using only a portion of  your data for trying different   
   regressions and reserve two other portions, one to test a number of   
   regressions for evaluating whichever model seems best.  See below for   
   references to mathematical techniques that can help you in cases where   
   you can't hold data back.   
      
   c) you have a dozen to a few dozen data points.  This situation is   
   REALLY difficult to deal with.  You probably can't judge between all of   
   the models that you are describing and unless you luck into a model   
   form (usually be deep knowledge of your system) that really works   
   incredibly well, you are in a really difficult spot statistically   
   speaking.  You can falsify some regressions with this much data, but it   
   is very difficult to derive models of any complexity that will work for   
   unseen data.   
      
   If you are up for some serious thinking and are will to basically roll   
   your own regression code, you might take a look at David Mackay's work   
   on the evidence method in regression problems.  Using such Bayesian   
   techniques with code written by some random schmoe is pretty difficult,   
   however.   
      
   Good luck.   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]