From: tims@despammed.com   
      
   On Mon, 17 Jul 2006 01:45:32 GMT, "Dephased"    
   wrote:   
      
   >Hello everyone,   
   >   
   >I have a dataset of observations (about 100 000 observations). Each   
   >observation gives me the state of 30 discrete variables at a given   
   >time.   
   >   
   >I would like to know if there exists any "distance" that could tell me   
   >"how far" an observation is from another? I am not trying to get the   
   >distance between two variables but rather between two "vectors" which   
   >are made of the observations of 30 different variables at a given time.   
   >   
   >I read up a bit on the subject but I must admit I am confused with all   
   >the possible measures and what they achieve: chi square, euclidean,   
   >mahalanobis...   
   >   
   >   
   >Thanks in advance for the help you may give me !   
   >   
      
   I would use the "distance between two points" formula and treat the 30   
   observations as dimensions.   
      
   Just as in two dimensions dist=SQRT( (x2-x2)^2 + (y2-y1)^2 )   
   and three dimensions dist=SQRT( (x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2 )   
   you can extrapolate that out to 30 dimensions.   
      
   You end up with a big formula. You can "normalize" the components of   
   the vectors using ratios to map them into a range (1-100) so that each   
   component of the vector will have equal weight. And then you could   
   weight them by importance, for example.   
      
   Hope that helps,   
      
   Tim   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|