home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 960 of 1,954   
   dataminer101 to All   
   Please help me with my Data Mining probl   
   14 Mar 06 23:46:16   
   
   From: dataminer101@yahoo.com   
      
   Hi Guys,   
      
   I kindly ask for your help with regards to my DM project. I am working   
   on a project that is related to the field of agriculture and that has   
   as an objective to find the "optimal values" of the operating   
   conditions that affect the outcome (the amount of meat produced i.e.   
   the weight) of an animal production (chicken broilers in my case). To   
   do so, I have to use historical data of previous productions as my   
   training dataset.  The length a production cycle is typically around 44   
   days.  For each production, a data acquisition system stores the   
   real-time and historical data of hundreds of parameters. These   
   parameters represent sensor measurements of all the operating   
   conditions (current temperature, set point temperature, humidity,   
   static pressure, etc...) and these are what I refer to as the inputs.   
   The operating costs and the production outcome are what I refer to as   
   outputs.  The operating cost is indirectly computed from parameters   
   like water consumption, feed consumption, heater/cooling runtimes, and   
   lighting runtime; and the outcome of a production is defined by   
   parameters like animal mortality and conversion factor (amount of feed   
   in Lbs to produce 1Lb of meat).  So the main objective of this project   
   is to find the set of "optimal daily values" (1value/day) for the   
   inputs that would minimize the operating costs and conversion ratio   
   outputs.   
   The biggest problem I am facing right now is the following:  The   
   historical data that I have in the DB are time series for each measured   
   parameter.  Some of these time series follow some kind of cyclic   
   pattern (e.g. daily water/feed consumption ...) while others follow an   
   increasing/decreasing trend (animal weight, total heater run time,   
   total water/feed consumption.....).  My goal is to be able to come up   
   with a model that suggests a set of curves for the optimal daily values   
   throughout the length of the production cycle, one curve for each   
   measured input/output parameter. This model would allow the farmer to   
   closely monitor his production on a daily basis to make sure his   
   production parameters follow the "optimal curves" suggested by my   
   model.  I have looked at ANN and I think it might be the solution to my   
   problem since it allows to model multiple input/outputs problems (Am I   
   wrong?), but I could not figure out a way to model the inputs/outputs   
   as time series (an array of values for each parameter). As far as I   
   know, all kinds of classifiers accept only single valued samples.   
   One approach would be to create one classifier/day (e.g. for day1:   
   extract a single value for each parameter and use these values as a   
   training sample and repeat this for all previous production to   
   construct the training set). The problem with this approach is that 44   
   or so classifiers will be constructed (hard to manage all of this) and   
   each of these resulting ANN will be some kind of "typical average"   
   of the training data but not necessarily the "optimal values"   
   leading to the best production outcome, if I am not mistaken.   
   Another approach would be to find a way to feed in the inputs and   
   outputs as time series (an array of 44 daily values for each   
   input/output parameter).  In this case, there would be only one   
   resulting ANN and the training samples, would be a set of arrays for   
   each parameter, as opposed to single daily parameter values in the   
   first case.  The problem is, I could not find any classifier that would   
   allow me to do that.   
      
   Another issue that I have is the amount of data. While a single   
   production cycle could represent 1-2GB of data, the length of the   
   production cycle (44 days) makes it difficult to have 100's of   
   production cycle historical data, as I could gather data for no more   
   than 7 full cycles/year.  Fortunately, a farm can have many production   
   units (5-10 barns/site in big sites), so this makes it possible to have   
   40-70 cycles/yr.  My question is: would this be enough to come up with   
   an acceptably accurate model or is it necessary to have hundreds of   
   samples?   
      
   Thanks for taking the time to reading this lengthy post, and I really   
   appreciate your help and thank you in advance.   
      
   Cheers.   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca