... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.programming
Programming issues that transcend langua
57,431 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 56,814 of 57,431
Tim Rentsch to Ben Bacarisse
Re: Another little puzzle
24 Dec 22 06:21:40
   From: tr.17687@z991.linuxsc.com   
      
   Ben Bacarisse  writes:   
      
   > Tim Rentsch  writes:   
   >   
   >> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>   
   >>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>   
   >>>> Given n times of the 24-hour day, print their average.   
   >>>> For example, the average of "eight o'clock" and   
   >>>> "ten o'clock" (n=2) would be "nine o'clock".   
   >>>> (You can choose any representation, for example "HH:MM"   
   >>>> or "seconds since midnight".)   
   >>>   
   >>>   Thanks for all replies!   
   >>>   
   >>>   I waited a few days before answering to allow   
   >>>   sufficient time to think about the problem.   
   >>>   
   >>>   There were not enough tests written and run.  As a result,   
   >>>   the puzzle has not yet been solved (unless I have overlooked   
   >>>   a contribution or misworded expectations).   
   >>>   
   >>>   So, here are two possible test cases.   
   >>>   
   >>> average( 23.5,  1.5 )==  0.5   
   >>> average( 11.5, 13.5 )== 12.5   
   >>>   
   >>>   (I use hours as units, so "0.5" means, "half past midnight".)   
   >>>   
   >>>   I hope that these test cases encode sensible expectations   
   >>>   for an average of two times on a 24-hour clock in the spirit   
   >>>   of the example given in the OP, which was, "the average of   
   >>>   eight o'clock and ten o'clock would be nine o'clock", since   
   >>>   these test cases just have rotated that example by 3.5 and   
   >>>   15.5 hours.   
   >>>   
   >>>   I believe that I have not seen an algorithm so far in this   
   >>>   thread that would pass these tests.   
   >>   
   >> As before, the problem is underspecified.   
   >   
   > Some remarks not specifically in reply to you, Tim...   
      
   I didn't really say very much.  Your comments are a lot more   
   interesting.   
      
   > The input is a collection, t(n), of n > 1 numbers in [0, 24).  The   
   > average should be a number, A, in [0, 24) that minimises   
   >   
   >   Sum_{i=1,n} distance(A, t(i))   
   >   
   > (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms   
   > of variance).  So far, this is just what an average is.  The key point   
   > is what is the distance (or difference) whose sum (or sum of squares)   
   > we want to minimise?  For times, I would say it is the length of the   
   > shorter arc round an imaginary 24-hour clock face.   
      
   Minimizing the square of the distance (aka length of the shorter arc)   
   is one metric, and I think a reasonable one.  (Minimizing the absolute   
   value of the distance gives a result more like the median than the   
   mean, IIANM.)   
      
   > The problem has a natural interpretation in terms of angles.  Whatever   
   > the circular quantity is, convert the values to unit vectors round a   
   > circle.  For times of day, just scale [0, 24) to [0, 2*pi).  The   
   > average is then just the direction of the average vector, converted   
   > back to a time of day.   
      
   Averaging the "time unit vectors" is another plausible metric.   
      
   > Sometimes that vector has zero length, and the average is undefined,   
   > but otherwise the length of the vector gives an indication of the   
   > variability of the data.   
   >   
   > Why do I consider this a reasonable interpretation of the problem?   
   > Well, given a list of times of day when a train is observed to pass   
   > some station, the circular 24-hour-time average should be our best   
   > estimate of the scheduled time.   
   >   
   > Obviously there are other possible readings of the problem, but I was   
   > not able to justify any of them as useful for any real-world   
   > applications.  This is a case where I hope I am wrong and there /are/   
   > other circular averages with practical interpretations.   
      
   A nice analysis.   
      
   Some observations (all of which I believe are correct, but no   
   guarantees are offered).   
      
   The "vector average" metric is easier to calculate and gives an   
   explicit indication when the data produces a degenerate result.   
      
   Considering only cases that do not produce a degenerate result:   
      
   (1) the "conventional average" metric and the "vector average"   
   metric can produce different results when there are more than two   
   data points.  (For one or two data points I think they are always   
   the same.)   
      
   (2) the "vector average" metric can be calculated in O(n).  The   
   "conventional average" metric can be calculated in O(n*n).   
      
   (3) when the times are all reasonably close together, the   
   "conventional average" metric seems more likely to produce a result   
   corresponding to what most people expect.  Another way to say this   
   is that the "vector average" metric will sometimes surprise people.   
      
   Incidentally, I had implemented a method that is isomorphic to   
   the "vector average" metric (at least I think it is), and I was   
   surprised to discover that it gave different results than the   
   "conventional average" metric in some cases.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]