home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 897 of 1,954   
   Ted Dunning to All   
   Re: Algorithm/Theory help: Patterns, com   
   28 Jan 06 09:45:37   
   
   From: ted.dunning@gmail.com   
      
   > This is an interesting problem and you seem to have a good   
   > understanding of it, but I think you may be trying to avoid the   
   > intrinsic complexity of the problem.   
      
   I absolutely am trying to avoid the intrinsic complexity.  The key   
   question is whether the simplified system works well enough on real   
   problems.  Sometimes a bit of tweaking is required (such as using   
   different size n-grams or doing a post-pass with dynamic programming).   
   Other times, the simplified system is just not usable.   
      
   > Working on this problem has made me realize (again) that I cannot rely on a   
   > simple dictionary lookup for my own AI project.   
      
   This is almost always true for practical systems.  Handling unknown   
   words well is very powerful.   
      
      
      
   > (The   
   > methods that can be used to evaluate syntactic transformations can also   
   > be applied to conceptual relations since conceptual relations can be   
   > stored as strings or fields, and so a working solution to this problem   
   > is really significant.)   
      
   Consider carefully if you can't figure out an interesting measure that   
   can be approximated using indexing.  The key is to have a measure that   
   has false positives, but no (or very few) false negatives.  Since you   
   can retrieve candidates very fast (your approximation si index based),   
   you can use this as a first pass to get examples where you should spend   
   the time to compute a better approximation.   
      
   This is the general approach taken, for instance, by genetic sequence   
   search systems.  N-gram indices are used to find candidate regions for   
   detailed matching and then dynamic program methods are used to score in   
   detail.  This can be used much more generally than just these   
   applications.   
      
      
   > I have always rejected strong scoring   
   > systems as being too arbitrary and too simplistic, and I thought that   
   > making multiple evaluations of derived characteristics would produce so   
   > many additional complications that it would complicate the problem   
   > rather than solve it, but now I see that a more flexible and complex   
   > scoring system could be both feasible and powerful.   
      
   Indeed.  And if you scores are probabilistic in nature, you can even   
   reason about your results and the degree to which you have a good   
   approximation.  Keep this up!   
      
      
   > ... Any means to   
   > simplify problems like the one you are considering could be helpful,   
   > but simplifying the problem too much could severely limit its   
   > effectiveness.   
      
   You can quantify this by measuring false positive and false negative   
   rates for your scores.  False positives cost you compute time to get   
   rid of and false negatives mean that you miss real solutions.  You can   
   also look at this in terms of error distributions on scores.  This   
   allows you to compute useful (sometimes) bounds on actual performance   
      
   Good luck!   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca