... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.programming
Programming issues that transcend langua
57,431 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 56,870 of 57,431
Tim Rentsch to Ben Bacarisse
Re: Another little puzzle
31 Dec 22 06:42:15
   From: tr.17687@z991.linuxsc.com   
      
   Ben Bacarisse  writes:   
      
   > Ben Bacarisse  writes:   
   >   
   >> Tim Rentsch  writes:   
   >   
   > A couple of further thoughts only one of which is directed at you   
   > Tim (at the end)...   
   >   
   > Given the general conception of a mean (rather than any other kind of   
   > summary statistic) as minimising the sum of squares of some "distance"   
   > metric:   
   >   
   >   Sum_{i=1,n} difference(A, t(i))^2   
   >   
   > we can characterise the two contenders by what distance is being   
   > minimised.  For the less well discussed "conventional average" (to use   
   > your terminology) we are minimising the sum of squares of the shorter   
   > arc lengths between A and the t(i).   
   >   
   > For the "vector average", we convert the t(i) to unit vectors u(i) and   
   > we calculate the mean if the u(i) to get a vector m.  The "average", A,   
   > is just the direction of this vector -- another point on the unit   
   > circle.  In this case we are minimising the sum of squares of the   
   > /chord/ lengths between A and the t(i).   
      
   I think of this approach differently.  I take the time values   
   t(i) as being unit masses on the unit circle, and calculate the   
   center of mass.  As long as the center of mass is not the origin   
   we can project it from the origin to find a corresponding time   
   value on the unit circle (which in my case is done implicitly by   
   using atan2()).   
      
   > The mean vector itself (which may not lie on the unit circle) minimises   
   > the sum of the squares of the length of the vector differences m-u(i):   
   >   
   >   Sum_{i=1,n} |m - u(i)|^2   
      
   That's true but I thought of it just as the average of the time   
   value "masses" on the unit circle.   
      
   > and any other vector along the same line as m (i.e. c*m for real c)   
   > minimises   
   >   
   >   Sum_{i=1,n} |c*m - u(i)|^2   
      
   I'm not sure what you're saying here.  Only the one point (m in   
   your terminology) minimizes the sum of squares of distances.  How   
   do other points on the same line qualify?   
      
   > This includes, of course, m projected out to the unit circle.   
   >   
   > This distinction between arc lengths and chord lengths helps to   
   > visualise where these averages differ, and why the conventional   
   > average may seem more intuitive.   
      
   Interesting perspective.  I wouldn't call them chord lengths   
   because I think of a chord as being between two points both on   
   the same circle, and the center of mass is never on the unit   
   circle (not counting the case when all the time values are the   
   same).  Even so it's an interesting way to view the distinction.   
      
   > Incidentally, I found another book on statistics on spheres, and that   
   > gets the average over in a few paragraphs.  It states, without   
   > considering any alternatives, that the average is the vector average.  I   
   > can't find anyone using or citing the arc-length minimising average,   
   > despite it being natural on a sphere to find the mid-point of, say, a   
   > set of points on the Earth's surface.   
      
   Do you mind my asking, which book was that?   
      
   Now that I think about it, finding the point that minimizes the   
   great circle distances squared would be at least computationally   
   unpleasant.  But it could be relevant in some contexts (thinking   
   in particular of astronautics, or, dare I say it, rocket science).   
      
   > Tim, my best shot at calculating this other average sorts the points to   
   > find the widest gap.  I suspect your algorithm is similar since you say   
   > it is O(n log n).   
      
   Your instinct that I sorted the time values is right.  If you   
   think a little more I think you will see that the widest gap need   
   not have any particular relationship to where the cut point is.   
   No further elaboration for now so as not to be a spoiler (and I   
   know you like to try figuring things out yourself).   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]