From: ben.usenet@bsb.me.uk   
      
   Mike Terry writes:   
      
   > On 24/12/2022 14:21, Tim Rentsch wrote:   
   >> Ben Bacarisse writes:   
   >>   
   >>> Tim Rentsch writes:   
   >>>   
   >>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>>   
   >>>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >>>>>   
   >>>>>> Given n times of the 24-hour day, print their average.   
   >>>>>> For example, the average of "eight o'clock" and   
   >>>>>> "ten o'clock" (n=2) would be "nine o'clock".   
   >>>>>> (You can choose any representation, for example "HH:MM"   
   >>>>>> or "seconds since midnight".)   
   >>>>>   
   >>>>> Thanks for all replies!   
   >>>>>   
   >>>>> I waited a few days before answering to allow   
   >>>>> sufficient time to think about the problem.   
   >>>>>   
   >>>>> There were not enough tests written and run. As a result,   
   >>>>> the puzzle has not yet been solved (unless I have overlooked   
   >>>>> a contribution or misworded expectations).   
   >>>>>   
   >>>>> So, here are two possible test cases.   
   >>>>>   
   >>>>> average( 23.5, 1.5 )== 0.5   
   >>>>> average( 11.5, 13.5 )== 12.5   
   >>>>>   
   >>>>> (I use hours as units, so "0.5" means, "half past midnight".)   
   >>>>>   
   >>>>> I hope that these test cases encode sensible expectations   
   >>>>> for an average of two times on a 24-hour clock in the spirit   
   >>>>> of the example given in the OP, which was, "the average of   
   >>>>> eight o'clock and ten o'clock would be nine o'clock", since   
   >>>>> these test cases just have rotated that example by 3.5 and   
   >>>>> 15.5 hours.   
   >>>>>   
   >>>>> I believe that I have not seen an algorithm so far in this   
   >>>>> thread that would pass these tests.   
   >>>>   
   >>>> As before, the problem is underspecified.   
   >>>   
   >>> Some remarks not specifically in reply to you, Tim...   
   >> I didn't really say very much. Your comments are a lot more   
   >> interesting.   
   >>   
   >>> The input is a collection, t(n), of n > 1 numbers in [0, 24). The   
   >>> average should be a number, A, in [0, 24) that minimises   
   >>>   
   >>> Sum_{i=1,n} distance(A, t(i))   
   >>>   
   >>> (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms   
   >>> of variance). So far, this is just what an average is. The key point   
   >>> is what is the distance (or difference) whose sum (or sum of squares)   
   >>> we want to minimise? For times, I would say it is the length of the   
   >>> shorter arc round an imaginary 24-hour clock face.   
   >> Minimizing the square of the distance (aka length of the shorter arc)   
   >> is one metric, and I think a reasonable one. (Minimizing the absolute   
   >> value of the distance gives a result more like the median than the   
   >> mean, IIANM.)   
   >   
   > Yes - perhaps Ben was thinking of aking   
   >   
   > Sum_{i=1,n} signed_distance(A, t(i)) = 0   
   >   
   > rather than minimising the (absolute) distance. That would match up   
   > with minimising the sum of squares of the distance (mean). Or he just   
   > meant the median as you say, which is different from the mean. Both   
   > have plausible "average" properties,   
      
   Well, I was trying to be deliberately vague (hence my talking about "an   
   average" rather than mean, median or whatever) but I was no vague enough   
   with the formula.   
      
   I wasn't thinking of a zero net distance because some metrics (though   
   probably not applicable in this situation) don't always allow for a zero   
   sum. I should have referred to minimising some kind distance sum, be it   
   the absolute value of a signed sum or, as in the solution I was   
   proposing, the sum of directional differences (i.e. full vectors).   
      
   > and meet my "minimal" criteria   
   > for what I think anyone would require of an "average" applying to a   
   > modular arithmetic:   
   >   
   > - if all x_i are within 5 minutes of each other [ALLOWING FOR WRAPPING   
   > AROUND THE END OF THE MODULO LOOP] then the "average" will be within   
   > those same 5 minutes. NOT e.g. 12 hours or 4 hours or whatever away.   
      
   And when the x_i are widely spaced the average might be considered   
   nearly meaningless. If you have {0, 8, 16} hours then what is the   
   "average"? Are there cases where every set of circular values has a   
   meaningful "average"? If not, you really need more than a single number   
   answer.   
      
   (These are not questions to you, specifically, I'm interested to know of   
   interpretations that go beyond the way I see things.)   
      
   > - if all x_i are translated by a fixed offset like 1 hour, the average   
   > will translate by the same offset, i.e. it maintains the same "modulo"   
   > relation with the translated x_i.   
   >   
   > hmm, thinking now I'll add the similar:   
   >   
   > - if all x_i are "reflected" about a chosen time, then the new average   
   > will be the reflected time of the original average.   
   >   
   > (those aren't intended to be a definition of a "modular average", just   
   > to exclude suggestions that blatently lack the right symmetry for such   
   > an average.)   
   >   
   >>> The problem has a natural interpretation in terms of angles. Whatever   
   >>> the circular quantity is, convert the values to unit vectors round a   
   >>> circle. For times of day, just scale [0, 24) to [0, 2*pi). The   
   >>> average is then just the direction of the average vector, converted   
   >>> back to a time of day.   
   >> Averaging the "time unit vectors" is another plausible metric.   
   >   
   > Yes I think this one would work best in most situations, given that   
   > the OP didn't have a precise mathematical requirement for his solution   
   > - probably he wants something that just works reasonably as expected.   
   > (It's my favourite answer.)   
   >   
   > Another way of thinking of this approach is "balancing" the 24-hour   
   > modular circle, where we've put unit weights at each of the times x_i.   
   > E.g. we look for a balancing line through the centre of the circle.   
      
   Is that the same? For {0, 8, 16} any of the three lines through 0, 8 or   
   16 balances, but the vector sum interpretation is undefined. And for   
   {0, 6, 12, 18} there are two balance lines, neither of which seems to be   
   a natural answer for the average.   
      
   (Also, every balance line gives two answers, so how do you pick?)   
      
   > [Note times on the circle go from 1-24, not 1-12 like a normal clock.]   
   > Also, thinking like this, it's equivalent to (conventional) averaging   
   > of the sin of the angular differences from the average (balancing)   
   > point, since the sin will give the perpendicular distance of the   
   > weight from the balancing line. So it still fits with the general   
   > scheme of minimising distance(x_i, v)^2, but with a slightly different   
   > metric (distance function) in use.   
   >   
   > As sin(x) is close to x when x is small, the vector average solution   
   > would match closely with the mean (by arc length) solution when the   
   > times are close together, but diverge more noticeably when the times   
   > are spread out...   
      
   --   
   Ben.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|