From: ben.usenet@bsb.me.uk   
      
   Tim Rentsch writes:   
      
   > Ben Bacarisse writes:   
   >   
   >> Tim Rentsch writes:   
   >>   
   >>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>   
   >>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>>   
   >>>>> Given n times of the 24-hour day, print their average.   
   >>>>> For example, the average of "eight o'clock" and   
   >>>>> "ten o'clock" (n=2) would be "nine o'clock".   
   >>>>> (You can choose any representation, for example "HH:MM"   
   >>>>> or "seconds since midnight".)   
   >>>>   
   >>>> Thanks for all replies!   
   >>>>   
   >>>> I waited a few days before answering to allow   
   >>>> sufficient time to think about the problem.   
   >>>>   
   >>>> There were not enough tests written and run. As a result,   
   >>>> the puzzle has not yet been solved (unless I have overlooked   
   >>>> a contribution or misworded expectations).   
   >>>>   
   >>>> So, here are two possible test cases.   
   >>>>   
   >>>> average( 23.5, 1.5 )== 0.5   
   >>>> average( 11.5, 13.5 )== 12.5   
   >>>>   
   >>>> (I use hours as units, so "0.5" means, "half past midnight".)   
   >>>>   
   >>>> I hope that these test cases encode sensible expectations   
   >>>> for an average of two times on a 24-hour clock in the spirit   
   >>>> of the example given in the OP, which was, "the average of   
   >>>> eight o'clock and ten o'clock would be nine o'clock", since   
   >>>> these test cases just have rotated that example by 3.5 and   
   >>>> 15.5 hours.   
   >>>>   
   >>>> I believe that I have not seen an algorithm so far in this   
   >>>> thread that would pass these tests.   
   >>>   
   >>> As before, the problem is underspecified.   
   >>   
   >> Some remarks not specifically in reply to you, Tim...   
   >   
   > I didn't really say very much. Your comments are a lot more   
   > interesting.   
   >   
   >> The input is a collection, t(n), of n > 1 numbers in [0, 24). The   
   >> average should be a number, A, in [0, 24) that minimises   
   >>   
   >> Sum_{i=1,n} distance(A, t(i))   
   >>   
   >> (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms   
   >> of variance). So far, this is just what an average is. The key point   
   >> is what is the distance (or difference) whose sum (or sum of squares)   
   >> we want to minimise? For times, I would say it is the length of the   
   >> shorter arc round an imaginary 24-hour clock face.   
   >   
   > Minimizing the square of the distance (aka length of the shorter arc)   
   > is one metric, and I think a reasonable one. (Minimizing the absolute   
   > value of the distance gives a result more like the median than the   
   > mean, IIANM.)   
   >   
   >> The problem has a natural interpretation in terms of angles. Whatever   
   >> the circular quantity is, convert the values to unit vectors round a   
   >> circle. For times of day, just scale [0, 24) to [0, 2*pi). The   
   >> average is then just the direction of the average vector, converted   
   >> back to a time of day.   
   >   
   > Averaging the "time unit vectors" is another plausible metric.   
   >   
   >> Sometimes that vector has zero length, and the average is undefined,   
   >> but otherwise the length of the vector gives an indication of the   
   >> variability of the data.   
   >>   
   >> Why do I consider this a reasonable interpretation of the problem?   
   >> Well, given a list of times of day when a train is observed to pass   
   >> some station, the circular 24-hour-time average should be our best   
   >> estimate of the scheduled time.   
   >>   
   >> Obviously there are other possible readings of the problem, but I was   
   >> not able to justify any of them as useful for any real-world   
   >> applications. This is a case where I hope I am wrong and there /are/   
   >> other circular averages with practical interpretations.   
   >   
   > A nice analysis.   
      
   A generous appraisal, thanks. I was trying to be vague enough not to   
   put my foot in it, by was too specific with the formula.   
      
   > Some observations (all of which I believe are correct, but no   
   > guarantees are offered).   
   >   
   > The "vector average" metric is easier to calculate and gives an   
   > explicit indication when the data produces a degenerate result.   
   >   
   > Considering only cases that do not produce a degenerate result:   
   >   
   > (1) the "conventional average" metric and the "vector average"   
   > metric can produce different results when there are more than two   
   > data points. (For one or two data points I think they are always   
   > the same.)   
   >   
   > (2) the "vector average" metric can be calculated in O(n). The   
   > "conventional average" metric can be calculated in O(n*n).   
      
   By which I presume you mean and /not/ in O(n). (Big O is just an upper   
   bound.)   
      
   > (3) when the times are all reasonably close together, the   
   > "conventional average" metric seems more likely to produce a result   
   > corresponding to what most people expect. Another way to say this   
   > is that the "vector average" metric will sometimes surprise people.   
      
   I'm sorry to be obtuse, but what is the "conventional average"? The   
   name makes it sound trivial, but the quadratic time makes me certain   
   that is isn't.   
      
   My "conventional average" algorithm (which is not well thought out) was   
   to (a) rotate the data set to avoid the 23/0 boundary (not always   
   possible), (b) take the arithmetic mean, and then (c) rotate the result   
   back. E.g. [23,0,1] -> [0,1,2] by adding one, and the average is   
   mean[0,1,2] - 1 = 0.   
      
   --   
   Ben.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|