... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.ai
Awaiting the gospel from Sarah Connor
1,954 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 1,456 of 1,954
Ted Dunning to jasonjeffreyjo...@gmail.com
Re: How to order from noisy comparisons?
16 Jul 07 11:16:17
   From: ted.dunning@gmail.com   
      
   On Jul 13, 6:20 pm, jasonjeffreyjo...@gmail.com wrote:   
   > Assume we have a set of unordered items, e.g. [e,c,a,d,f,b].  There   
   > exists some true ordering (e.g. [a,b,c,d,e,f]) but it is unknown to   
   > us.  The only queries we can make are item-to-item comparisons (e.g.   
   > Should item a be before item c?).  The yes/no answers we receive will   
   > be noisy (e.g. the same question may result in yes answers some of the   
   > time and no other times) but can be assumed to be correct more often   
   > than incorrect.  For simplicity, assume the items to be compared are   
   > chosen randomly (i.e. we don't have to/aren't able to specify a policy   
   > for choosing items to compare).   
   >   
   > The ultimate goal is to reorder our unordered set to resemble the   
   > unknowable true ordering as best as possible given the information   
   > received so far.  What is the procedure for reordering the items after   
   > each successive comparison?   
   >   
   > I'm hoping someone can point me to a specific algorithm that does   
   > exactly this.  (I'm assuming this is a solved problem.  I'm just not   
   > familiar enough with the terminology to find it.)   
   >   
   > Can someone point me to a paper or textbook example?   
   >   
      
   This is a pretty simple problem if you assume that all of the answers   
   are perturbed from the correct state independently of all other   
   answers and approach the problem from a Bayesian viewpoint.   
      
   Assume that you have some unknown probability of incorrect answers   
   from which the individual probabilities of erroneous answers are   
   drawn.  Initially, you have no knowledge of any comparison order.   
   Each answer gives you some information about that comparison (and   
   others, because the answers are not independent).  When stated this   
   way, your problem is to compute the posterior distribution on   
   orderings.  When one ordering dominates the distribution, you can take   
   that as the correct answer.   
      
   Given a prior on the probability of erroneous answers (you stated such   
   a prior, for instance when you said that the answers were more correct   
   than not), you can define the prior distribution of orderings (all   
   permutations equal is easy) which in turn defines the distribution of   
   true answers.  The observed answers are dependent on the true answers   
   and the probability of error.   
      
   Given this much, you can write a pretty simple MCMC sampler that   
   samples from the posterior distribution of orderings and error   
   probabilities given the answers you have so far.  This process usually   
   has some gotchas in that you can pose the problem in a way that causes   
   some real difficulties, but I think (based on 30 seconds of thought)   
   that using the underlying ordering as the hidden state makes the   
   sampler pretty simple.  Using the true answers would probably make the   
   problem pretty hard because the interdependencies between answers are   
   pretty tricky.   
      
   Note that the MCMC approach gives you pretty good estimates of how   
   much certainty you can have about partial answers.  Imagine, for   
   instance, that the items are colored and you want to know not only   
   what the underlying order is, but whether all of the blue items come   
   before the red items.  You might well know this long before you know   
   the actual ordering and you would be able to derive good probabilistic   
   estimates of the quality of your answer as well.   
      
   Note also that this formulation can give you good heuristic guidance   
   as to what question to ask next to tighten down the final answer as   
   quickly as possible.  One simple way to approximate the best question   
   is to simple find the pair about which you have the most uncertainty   
   and ask about them.  A slightly better answer is to estimate the   
   contribution all of the answers might give you in terms of the entropy   
   of the distribution of sequences (or whatever question you might want   
   to answer).   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]