Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 897 of 1,954    |
|    Ted Dunning to All    |
|    Re: Algorithm/Theory help: Patterns, com    |
|    28 Jan 06 09:45:37    |
      From: ted.dunning@gmail.com              > This is an interesting problem and you seem to have a good       > understanding of it, but I think you may be trying to avoid the       > intrinsic complexity of the problem.              I absolutely am trying to avoid the intrinsic complexity. The key       question is whether the simplified system works well enough on real       problems. Sometimes a bit of tweaking is required (such as using       different size n-grams or doing a post-pass with dynamic programming).       Other times, the simplified system is just not usable.              > Working on this problem has made me realize (again) that I cannot rely on a       > simple dictionary lookup for my own AI project.              This is almost always true for practical systems. Handling unknown       words well is very powerful.                            > (The       > methods that can be used to evaluate syntactic transformations can also       > be applied to conceptual relations since conceptual relations can be       > stored as strings or fields, and so a working solution to this problem       > is really significant.)              Consider carefully if you can't figure out an interesting measure that       can be approximated using indexing. The key is to have a measure that       has false positives, but no (or very few) false negatives. Since you       can retrieve candidates very fast (your approximation si index based),       you can use this as a first pass to get examples where you should spend       the time to compute a better approximation.              This is the general approach taken, for instance, by genetic sequence       search systems. N-gram indices are used to find candidate regions for       detailed matching and then dynamic program methods are used to score in       detail. This can be used much more generally than just these       applications.                     > I have always rejected strong scoring       > systems as being too arbitrary and too simplistic, and I thought that       > making multiple evaluations of derived characteristics would produce so       > many additional complications that it would complicate the problem       > rather than solve it, but now I see that a more flexible and complex       > scoring system could be both feasible and powerful.              Indeed. And if you scores are probabilistic in nature, you can even       reason about your results and the degree to which you have a good       approximation. Keep this up!                     > ... Any means to       > simplify problems like the one you are considering could be helpful,       > but simplifying the problem too much could severely limit its       > effectiveness.              You can quantify this by measuring false positive and false negative       rates for your scores. False positives cost you compute time to get       rid of and false negatives mean that you miss real solutions. You can       also look at this in terms of error distributions on scores. This       allows you to compute useful (sometimes) bounds on actual performance              Good luck!              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca