From: tr.17687@z991.linuxsc.com   
      
   Ben Bacarisse writes:   
      
   > Tim Rentsch writes:   
   >   
   >> Ben Bacarisse writes:   
   >>   
   >>> Tim Rentsch writes:   
   >>>   
   >>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>>   
   >>>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>>>   
   >>>>>> Given n times of the 24-hour day, print their average.   
   >>>>>> For example, the average of "eight o'clock" and   
   >>>>>> "ten o'clock" (n=2) would be "nine o'clock".   
   >>>>>> (You can choose any representation, for example "HH:MM"   
   >>>>>> or "seconds since midnight".)   
   >>>>>   
   >>>>> Thanks for all replies!   
   >>>>>   
   >>>>> I waited a few days before answering to allow   
   >>>>> sufficient time to think about the problem.   
   >>>>>   
   >>>>> There were not enough tests written and run. As a result,   
   >>>>> the puzzle has not yet been solved (unless I have overlooked   
   >>>>> a contribution or misworded expectations).   
   >>>>>   
   >>>>> So, here are two possible test cases.   
   >>>>>   
   >>>>> average( 23.5, 1.5 )== 0.5   
   >>>>> average( 11.5, 13.5 )== 12.5   
   >>>>>   
   >>>>> (I use hours as units, so "0.5" means, "half past midnight".)   
   >>>>>   
   >>>>> I hope that these test cases encode sensible expectations   
   >>>>> for an average of two times on a 24-hour clock in the spirit   
   >>>>> of the example given in the OP, which was, "the average of   
   >>>>> eight o'clock and ten o'clock would be nine o'clock", since   
   >>>>> these test cases just have rotated that example by 3.5 and   
   >>>>> 15.5 hours.   
   >>>>>   
   >>>>> I believe that I have not seen an algorithm so far in this   
   >>>>> thread that would pass these tests.   
   >>>>   
   >>>> As before, the problem is underspecified.   
   >>>   
   >>> Some remarks not specifically in reply to you, Tim...   
   >>   
   >> I didn't really say very much. Your comments are a lot more   
   >> interesting.   
   >>   
   >>> The input is a collection, t(n), of n > 1 numbers in [0, 24). The   
   >>> average should be a number, A, in [0, 24) that minimises   
   >>>   
   >>> Sum_{i=1,n} distance(A, t(i))   
   >>>   
   >>> (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms   
   >>> of variance). So far, this is just what an average is. The key point   
   >>> is what is the distance (or difference) whose sum (or sum of squares)   
   >>> we want to minimise? For times, I would say it is the length of the   
   >>> shorter arc round an imaginary 24-hour clock face.   
   >>   
   >> Minimizing the square of the distance (aka length of the shorter arc)   
   >> is one metric, and I think a reasonable one. (Minimizing the absolute   
   >> value of the distance gives a result more like the median than the   
   >> mean, IIANM.)   
   >>   
   >>> The problem has a natural interpretation in terms of angles. Whatever   
   >>> the circular quantity is, convert the values to unit vectors round a   
   >>> circle. For times of day, just scale [0, 24) to [0, 2*pi). The   
   >>> average is then just the direction of the average vector, converted   
   >>> back to a time of day.   
   >>   
   >> Averaging the "time unit vectors" is another plausible metric.   
   >>   
   >>> Sometimes that vector has zero length, and the average is undefined,   
   >>> but otherwise the length of the vector gives an indication of the   
   >>> variability of the data.   
   >>>   
   >>> Why do I consider this a reasonable interpretation of the problem?   
   >>> Well, given a list of times of day when a train is observed to pass   
   >>> some station, the circular 24-hour-time average should be our best   
   >>> estimate of the scheduled time.   
   >>>   
   >>> Obviously there are other possible readings of the problem, but I was   
   >>> not able to justify any of them as useful for any real-world   
   >>> applications. This is a case where I hope I am wrong and there /are/   
   >>> other circular averages with practical interpretations.   
   >>   
   >> A nice analysis.   
   >   
   > A generous appraisal, thanks. I was trying to be vague enough not to   
   > put my foot in it, by was too specific with the formula.   
   >   
   >> Some observations (all of which I believe are correct, but no   
   >> guarantees are offered).   
   >>   
   >> The "vector average" metric is easier to calculate and gives an   
   >> explicit indication when the data produces a degenerate result.   
   >>   
   >> Considering only cases that do not produce a degenerate result:   
   >>   
   >> (1) the "conventional average" metric and the "vector average"   
   >> metric can produce different results when there are more than two   
   >> data points. (For one or two data points I think they are always   
   >> the same.)   
   >>   
   >> (2) the "vector average" metric can be calculated in O(n). The   
   >> "conventional average" metric can be calculated in O(n*n).   
   >   
   > By which I presume you mean and /not/ in O(n). (Big O is just an   
   > upper bound.)   
      
   What I mean is I have an algorithm that runs in time proportional   
   to n*n. I suspect that there is an algorithm with better big-O   
   behavior (probably O( n * log n )), but I don't actually have one   
   to show, so I simply gave the best big-O result that I knew.   
      
   >> (3) when the times are all reasonably close together, the   
   >> "conventional average" metric seems more likely to produce a result   
   >> corresponding to what most people expect. Another way to say this   
   >> is that the "vector average" metric will sometimes surprise people.   
   >   
   > I'm sorry to be obtuse, but what is the "conventional average"? The   
   > name makes it sound trivial, but the quadratic time makes me certain   
   > that is isn't.   
      
   Sorry, I meant to refer to your formulation of average   
      
    A that minimizes { Sum_{i=1,n} difference(A, t(i))^2 }   
      
   where 'difference' means the shorter arc length. This formula   
   matches the result for 'mean' on real numbers.   
      
   > My "conventional average" algorithm (which is not well thought out) was   
   > to (a) rotate the data set to avoid the 23/0 boundary (not always   
   > possible), (b) take the arithmetic mean, and then (c) rotate the result   
   > back. E.g. [23,0,1] -> [0,1,2] by adding one, and the average is   
   > mean[0,1,2] - 1 = 0.   
      
   Yes, if you know where to split the cycle then the answer can be   
   found in O(n) time. But how can we figure out where to split the   
   cycle? (Incidentally I got that part wrong the first way I tried   
   to do it.)   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|