home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 816 of 1,954   
   Greg Heath to Ted Dunning   
   Re: Analog data in neural networks   
   27 Oct 05 01:09:01   
   
   XPost: comp.ai.neural-nets   
   From: heath@alumni.brown.edu   
      
   Ted Dunning wrote:   
   > Actually, centering and scaling aren't all that big of a deal; most NN   
   > training algorithms can handle a few mis-scaled or offset inputs pretty   
   > reasonably.  It is a good idea to scale and offset, but it isn't all   
   > that terribly important.   
      
   I have to disagree. I've been burned more than a few times by   
   inputting raw, unscaled, uncentered data. Now, I standardize   
   automatically.   
      
   > It is generally more helpful to use the empirical distribution of the   
   > inputs to non-linearly rescale inputs to more conventional   
   > distributions.  This particularly helps when inputs have multiple size   
   > scales, but the nature of the scales is not particularly clear.  By   
   > scaling by the empirical distribution and then converting the resulting   
   > (approximately) uniformly distributed value using an inverse normal   
   > cumulative distribution to an approximately normally distributed value.   
   >   
   > Once input redistribution is done, it can also be helpful to transform   
   > all inputs by first clustering the data and then using the distances to   
   > all of the cluster centroids instead of (or in addition to) the   
   > original inputs.  These distances may best be described in terms of   
   > probabilities by assuming each cluster is itself normally distributed.   
   > EM based clustering algorithms are particularly good for this.   
   >   
   > These are non-linear transformations which can be difficult to   
   > characterize theoretically, but the practical import in certain   
   > problems can be substantial.   
      
   When inputs are severely skewed, nonlinear scaling is highly   
   recommended. However, I can't remember ever having to do this.   
      
   Experts over in sci.stat.consult frown on automatically trying to   
   "normalize" inputs (and I agree). Their emphasis is on trying   
   to deal with nonnormal errors. Since normal inputs don't guarantee   
   normal errors, some consider it a distracting waste of time.   
      
   I don't remember exactly what Warren says in the FAQ, but I think   
   that his advice tends to agree with mine (actually, it's vice versa).   
      
   My best advice is to use scatter plots, clustering and PCA to   
   get familiar with the nature of the beast before doing anything.   
      
   Hope this helps.   
      
   Greg   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca