Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 965 of 1,954    |
|    Ted Dunning to All    |
|    Re: Please help me with my Data Mining p    |
|    15 Mar 06 09:19:02    |
      From: ted.dunning@gmail.com              It sounds to me like your data are being reduced to average outcomes.       In fact, what you want is the feed and growth of each individual       chicken so that you can determine what the variability of the outcome       is likely to be. Having the average or total result severely limits       what you can figure out from your data.              In particular, if really are collecting 1-2GB of data as system inputs       and only getting a few numbers out as outcomes, you aren't going to be       able to come up with anything very interesting in a retrospective       study, even if you can collect hundreds of samples. What you really       need to do is (a) get more refined and individual output parameters,       (b) simplify your input data and (c) build and test preliminary models       on real systems.              For (a), you need to look at your data and see if you can find more       than a few outcomes that have interestingly different inputs. This may       not be possible in your case.              For (b), you should look hard at your data and see how much variation       there really is in the input and how much the different inputs are       related. For instance, heating cost and outside temperature should be       pretty closely related to interior temperature. If so, you don't need       to keep all the inputs around. You should summarize with one input or       with a new synthetic input that incorporates all available data.       Similarly, if you have an input that never varies significantly, then       you might as well ignore it for now.              For (c), you have to figure out what range of models are possible given       the data you have and which alternative models are likely. Then you       need to split that model space into equal probability chunks and run       some tests. That will take time, but that will be the only way to       disambiguate most of the effects you are likely to see.              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca