... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.programming
Programming issues that transcend langua
57,431 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 56,829 of 57,431
Tim Rentsch to Ben Bacarisse
Re: Another little puzzle
25 Dec 22 01:05:06
   From: tr.17687@z991.linuxsc.com   
      
   Ben Bacarisse  writes:   
      
   > Tim Rentsch  writes:   
   >   
   >> Ben Bacarisse  writes:   
   >>   
   >>> Tim Rentsch  writes:   
   >>>   
   >>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>>   
   >>>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>>>   
   >>>>>> Given n times of the 24-hour day, print their average.   
   >>>>>> For example, the average of "eight o'clock" and   
   >>>>>> "ten o'clock" (n=2) would be "nine o'clock".   
   >>>>>> (You can choose any representation, for example "HH:MM"   
   >>>>>> or "seconds since midnight".)   
   >>>>>   
   >>>>>   Thanks for all replies!   
   >>>>>   
   >>>>>   I waited a few days before answering to allow   
   >>>>>   sufficient time to think about the problem.   
   >>>>>   
   >>>>>   There were not enough tests written and run.  As a result,   
   >>>>>   the puzzle has not yet been solved (unless I have overlooked   
   >>>>>   a contribution or misworded expectations).   
   >>>>>   
   >>>>>   So, here are two possible test cases.   
   >>>>>   
   >>>>> average( 23.5,  1.5 )==  0.5   
   >>>>> average( 11.5, 13.5 )== 12.5   
   >>>>>   
   >>>>>   (I use hours as units, so "0.5" means, "half past midnight".)   
   >>>>>   
   >>>>>   I hope that these test cases encode sensible expectations   
   >>>>>   for an average of two times on a 24-hour clock in the spirit   
   >>>>>   of the example given in the OP, which was, "the average of   
   >>>>>   eight o'clock and ten o'clock would be nine o'clock", since   
   >>>>>   these test cases just have rotated that example by 3.5 and   
   >>>>>   15.5 hours.   
   >>>>>   
   >>>>>   I believe that I have not seen an algorithm so far in this   
   >>>>>   thread that would pass these tests.   
   >>>>   
   >>>> As before, the problem is underspecified.   
   >>>   
   >>> Some remarks not specifically in reply to you, Tim...   
   >>   
   >> I didn't really say very much.  Your comments are a lot more   
   >> interesting.   
   >>   
   >>> The input is a collection, t(n), of n > 1 numbers in [0, 24).  The   
   >>> average should be a number, A, in [0, 24) that minimises   
   >>>   
   >>>   Sum_{i=1,n} distance(A, t(i))   
   >>>   
   >>> (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms   
   >>> of variance).  So far, this is just what an average is.  The key point   
   >>> is what is the distance (or difference) whose sum (or sum of squares)   
   >>> we want to minimise?  For times, I would say it is the length of the   
   >>> shorter arc round an imaginary 24-hour clock face.   
   >>   
   >> Minimizing the square of the distance (aka length of the shorter arc)   
   >> is one metric, and I think a reasonable one.  (Minimizing the absolute   
   >> value of the distance gives a result more like the median than the   
   >> mean, IIANM.)   
   >>   
   >>> The problem has a natural interpretation in terms of angles.  Whatever   
   >>> the circular quantity is, convert the values to unit vectors round a   
   >>> circle.  For times of day, just scale [0, 24) to [0, 2*pi).  The   
   >>> average is then just the direction of the average vector, converted   
   >>> back to a time of day.   
   >>   
   >> Averaging the "time unit vectors" is another plausible metric.   
   >>   
   >>> Sometimes that vector has zero length, and the average is undefined,   
   >>> but otherwise the length of the vector gives an indication of the   
   >>> variability of the data.   
   >>>   
   >>> Why do I consider this a reasonable interpretation of the problem?   
   >>> Well, given a list of times of day when a train is observed to pass   
   >>> some station, the circular 24-hour-time average should be our best   
   >>> estimate of the scheduled time.   
   >>>   
   >>> Obviously there are other possible readings of the problem, but I was   
   >>> not able to justify any of them as useful for any real-world   
   >>> applications.  This is a case where I hope I am wrong and there /are/   
   >>> other circular averages with practical interpretations.   
   >>   
   >> A nice analysis.   
   >   
   > A generous appraisal, thanks.  I was trying to be vague enough not to   
   > put my foot in it, by was too specific with the formula.   
   >   
   >> Some observations (all of which I believe are correct, but no   
   >> guarantees are offered).   
   >>   
   >> The "vector average" metric is easier to calculate and gives an   
   >> explicit indication when the data produces a degenerate result.   
   >>   
   >> Considering only cases that do not produce a degenerate result:   
   >>   
   >> (1) the "conventional average" metric and the "vector average"   
   >> metric can produce different results when there are more than two   
   >> data points.  (For one or two data points I think they are always   
   >> the same.)   
   >>   
   >> (2) the "vector average" metric can be calculated in O(n).  The   
   >> "conventional average" metric can be calculated in O(n*n).   
   >   
   > By which I presume you mean and /not/ in O(n).  (Big O is just an   
   > upper bound.)   
      
   What I mean is I have an algorithm that runs in time proportional   
   to n*n.  I suspect that there is an algorithm with better big-O   
   behavior (probably O( n * log n )), but I don't actually have one   
   to show, so I simply gave the best big-O result that I knew.   
      
   >> (3) when the times are all reasonably close together, the   
   >> "conventional average" metric seems more likely to produce a result   
   >> corresponding to what most people expect.  Another way to say this   
   >> is that the "vector average" metric will sometimes surprise people.   
   >   
   > I'm sorry to be obtuse, but what is the "conventional average"?  The   
   > name makes it sound trivial, but the quadratic time makes me certain   
   > that is isn't.   
      
   Sorry, I meant to refer to your formulation of average   
      
       A that minimizes { Sum_{i=1,n} difference(A, t(i))^2 }   
      
   where 'difference' means the shorter arc length.  This formula   
   matches the result for 'mean' on real numbers.   
      
   > My "conventional average" algorithm (which is not well thought out) was   
   > to (a) rotate the data set to avoid the 23/0 boundary (not always   
   > possible), (b) take the arithmetic mean, and then (c) rotate the result   
   > back.  E.g. [23,0,1] -> [0,1,2] by adding one, and the average is   
   > mean[0,1,2] - 1 = 0.   
      
   Yes, if you know where to split the cycle then the answer can be   
   found in O(n) time.  But how can we figure out where to split the   
   cycle?  (Incidentally I got that part wrong the first way I tried   
   to do it.)   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]