Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,456 of 1,954    |
|    Ted Dunning to jasonjeffreyjo...@gmail.com    |
|    Re: How to order from noisy comparisons?    |
|    16 Jul 07 11:16:17    |
      From: ted.dunning@gmail.com              On Jul 13, 6:20 pm, jasonjeffreyjo...@gmail.com wrote:       > Assume we have a set of unordered items, e.g. [e,c,a,d,f,b]. There       > exists some true ordering (e.g. [a,b,c,d,e,f]) but it is unknown to       > us. The only queries we can make are item-to-item comparisons (e.g.       > Should item a be before item c?). The yes/no answers we receive will       > be noisy (e.g. the same question may result in yes answers some of the       > time and no other times) but can be assumed to be correct more often       > than incorrect. For simplicity, assume the items to be compared are       > chosen randomly (i.e. we don't have to/aren't able to specify a policy       > for choosing items to compare).       >       > The ultimate goal is to reorder our unordered set to resemble the       > unknowable true ordering as best as possible given the information       > received so far. What is the procedure for reordering the items after       > each successive comparison?       >       > I'm hoping someone can point me to a specific algorithm that does       > exactly this. (I'm assuming this is a solved problem. I'm just not       > familiar enough with the terminology to find it.)       >       > Can someone point me to a paper or textbook example?       >              This is a pretty simple problem if you assume that all of the answers       are perturbed from the correct state independently of all other       answers and approach the problem from a Bayesian viewpoint.              Assume that you have some unknown probability of incorrect answers       from which the individual probabilities of erroneous answers are       drawn. Initially, you have no knowledge of any comparison order.       Each answer gives you some information about that comparison (and       others, because the answers are not independent). When stated this       way, your problem is to compute the posterior distribution on       orderings. When one ordering dominates the distribution, you can take       that as the correct answer.              Given a prior on the probability of erroneous answers (you stated such       a prior, for instance when you said that the answers were more correct       than not), you can define the prior distribution of orderings (all       permutations equal is easy) which in turn defines the distribution of       true answers. The observed answers are dependent on the true answers       and the probability of error.              Given this much, you can write a pretty simple MCMC sampler that       samples from the posterior distribution of orderings and error       probabilities given the answers you have so far. This process usually       has some gotchas in that you can pose the problem in a way that causes       some real difficulties, but I think (based on 30 seconds of thought)       that using the underlying ordering as the hidden state makes the       sampler pretty simple. Using the true answers would probably make the       problem pretty hard because the interdependencies between answers are       pretty tricky.              Note that the MCMC approach gives you pretty good estimates of how       much certainty you can have about partial answers. Imagine, for       instance, that the items are colored and you want to know not only       what the underlying order is, but whether all of the blue items come       before the red items. You might well know this long before you know       the actual ordering and you would be able to derive good probabilistic       estimates of the quality of your answer as well.              Note also that this formulation can give you good heuristic guidance       as to what question to ask next to tighten down the final answer as       quickly as possible. One simple way to approximate the best question       is to simple find the pair about which you have the most uncertainty       and ask about them. A slightly better answer is to estimate the       contribution all of the answers might give you in terms of the entropy       of the distribution of sequences (or whatever question you might want       to answer).              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca