Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 912 of 1,954    |
|    Ted Dunning to All    |
|    Re: Algorithm/Theory help: Patterns, com    |
|    08 Feb 06 23:24:16    |
   
   From: ted.dunning@gmail.com   
      
   Marina,   
      
   Your second paragraph makes an important point, but in a fashion which   
   I think can lead to error. Essentially what you are saying is that if   
   you can't justify an assumption of unequal weighting, you have to   
   assume equal weighting. I think that you have to assume you don't   
   know.   
      
   I do think that this assumption of equal weighting is implicit in this   
   approach, but there is an important ambiguity as to what we might mean   
   about "all edits are equally important".   
      
   In the Olympics example, meaning 1 would be "golds, silvers and bronzes   
   are equally important". For another example, you might be looking at   
   how people make mechanical errors when they type where transpositions   
   are much more common than deletions or substitutions and thus in a   
   spelling program, if we require transpositions to get to a corrected   
   word, we should prefer substitution to an alternative that requires   
   deletions or substitutions. Similarly, it is important in the typing   
   example to recognize that substitions are much more common than you   
   would expect if you thought of them solely as concatenated deletions   
   and insertions.   
      
   Wittkowski implicitly agrees that there is an assumption of equal   
   weighting. He does this by saying that the fact that golds are more   
   important should be considered. Thus my assertion of an assumption on   
   his part.   
      
   A second meaning (and more the one I meant) is that not all   
   substitutions are even equally costly. To take typing again, it is   
   common to strike a % in placy of a $ because they are neighbors and   
   people have little practice with them (other than Perl programmers).   
   "a" and "f" are much less confused, however because they are not   
   proximal and typists have lots of practice on them. Mirror image   
   substitutions are also relatively common ("d" for "k").   
      
   Now, to get back to your second paragraph.   
      
   If you ignore the unequal probabilities of different kinds of edits and   
   implicitly assume equal probabilities, you may well get good results.   
   This is, however, in spite of your assumption and not because you made   
   no assumption. If the rest of your system is strong enough, you may   
   have very low error rates even if the assumption is very, very wrong,   
   but you shouldn't claim that you aren't making the assumption just   
   because the overall performance of your system is good. Thus, it may   
   well be that making the equal probability assumption as Wittkowki does   
   is a reasonable and practical thing to do.   
      
   It is also important (in theory, but not always in practice) to   
   distinguish between an assumption of equal likelihood and an assumption   
   of an uninformative prior such as a uniform distribution. Where you   
   have alternative attacks on a problem, some of which make certain   
   assumptions and some of which do not, you can choose between them by   
   using mixtures of uninformative priors. The virtue of this sort of   
   approach is that it gives you the benefit of using more detail if using   
   it is justified and it gives you the baseline effectiveness of simpler   
   approaches if it isn't either because there is no difference or there   
   is not enough data to estimate the differences.   
      
   [ comp.ai is moderated. To submit, just post and be patient, or if ]   
   [ that fails mail your article to
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca