... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 965 of 1,954

Ted Dunning to All

Re: Please help me with my Data Mining p

15 Mar 06 09:19:02

   From: ted.dunning@gmail.com   
      
   It sounds to me like your data are being reduced to average outcomes.   
   In fact, what you want is the feed and growth of each individual   
   chicken so that you can determine what the variability of the outcome   
   is likely to be.  Having the average or total result severely limits   
   what you can figure out from your data.   
      
   In particular, if really are collecting 1-2GB of data as system inputs   
   and only getting a few numbers out as outcomes, you aren't going to be   
   able to come up with anything very interesting in a retrospective   
   study, even if you can collect hundreds of samples.  What you really   
   need to do is (a) get more refined and individual output parameters,   
   (b) simplify your input data and (c) build and test preliminary models   
   on real systems.   
      
   For (a), you need to look at your data and see if you can find more   
   than a few outcomes that have interestingly different inputs.  This may   
   not be possible in your case.   
      
   For (b), you should look hard at your data and see how much variation   
   there really is in the input and how much the different inputs are   
   related.  For instance, heating cost and outside temperature should be   
   pretty closely related to interior temperature.  If so, you don't need   
   to keep all the inputs around.  You should summarize with one input or   
   with a new synthetic input that incorporates all available data.   
   Similarly, if you have an input that never varies significantly, then   
   you might as well ignore it for now.   
      
   For (c), you have to figure out what range of models are possible given   
   the data you have and which alternative models are likely.  Then you   
   need to split that model space into equal probability chunks and run   
   some tests.  That will take time, but that will be the only way to   
   disambiguate most of the effects you are likely to see.   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]