home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 894 of 1,954   
   jbromer@isp.com to Scott Smith   
   Re: Algorithm/Theory help: Patterns, com   
   26 Jan 06 01:03:03   
   
   Scott Smith wrote:   
   > I've been working on this interesting AI problem. I'm looking through a   
   > collection of variable length strings, comparing against a "master" string   
   > and looking for common features (i.e. "patterns"). So, as an example...   
   >   
   > Master string: "ABCDEF"   
   >   
   > ...ABCDEF...       exact match   
   > ...ABDCEF...       transposition   
   > ...ABDEF...         deletion   
   > ...ABC...DEF...    insertion   
   > ...DBEAFC...       unordered   
   > ...A...BDCF...      combination of the above   
   >   
   > These are the simplest types of patterns; there will probably be more.   
   >   
   > As you can imagine, any given comparison can come up with multiple possible   
   > match candidates, and that's where I'm having difficulty. I could invent an   
   > arbitrary set of rules for scoring potential matches, but I'm thinking that   
   > there might be some way to calculate a 'distance' between the master string   
   > and a potential pattern (something that doesn't involve arbitrary weights   
   > that I have to define).   
   >   
   > For example, the following seems reasonably obvious:   
   >   
   > - An exact match is closest (100%)   
   > - A single insertion scores lower   
   > - A double insertion scores lower still   
   > - ...etc   
   >   
   > Unfortunately it becomes less clear what the relative scoring would be   
   > between things like deletions and transpositions, much less combinations. I   
   > realize that any domain-specific knowledge would trump these general rules   
   > (for example, in a strand of DNA, a transposition might always score lower   
   > than a deletion), but I'm thinking that there may be some generic set of   
   > rules to start with.   
   >   
   > Any comments or suggestions would be greatly appreciated   
   >   
   > -Scott   
   >   
      
   This is an interesting problem and you seem to have a good   
   understanding of it, but I think you may be trying to avoid the   
   intrinsic complexity of the problem.  Working on this problem has made   
   me realize (again) that I cannot rely on a simple dictionary lookup for   
   my own AI project.  I really have to define a more complicated system   
   that can evaluate a number of different characteristics of a phrase   
   that is to be compared against the entries of this dictionary.  (The   
   methods that can be used to evaluate syntactic transformations can also   
   be applied to conceptual relations since conceptual relations can be   
   stored as strings or fields, and so a working solution to this problem   
   is really significant.)  However, while thinking about this problem   
   today, I realized that I could use a set of functions to evaluate and   
   score the various characteristics of the phrases that are to be entered   
   into the dictionary and use the signatures from these scores in the   
   comparisons.  This means that a comparison might be made fairly quickly   
   even using relatively complicated functions to evaluate multiple   
   characteristics of a phrase.  I have always rejected strong scoring   
   systems as being too arbitrary and too simplistic, and I thought that   
   making multiple evaluations of derived characteristics would produce so   
   many additional complications that it would complicate the problem   
   rather than solve it, but now I see that a more flexible and complex   
   scoring system could be both feasible and powerful.  So although I am   
   going with the invention and definition of rules that you would like to   
   avoid, the usefulness of the method that I have been thinking of can be   
   increased greatly through the use of multiple scoring systems similar   
   to the one that you seem to be looking for.   
      
   Computer programs are complicated because they go through different   
   stages and they branch according to conditionals.  Any means to   
   simplify problems like the one you are considering could be helpful,   
   but simplifying the problem too much could severely limit its   
   effectiveness.  Although the use of multiple scoring systems would seem   
   to defeat the wish to simplify the problem because it can make the   
   problem more complicated, it still can be used to categorize data that   
   the system will encounter in useful ways, and that could constitute a   
   major step forward.   
      
   Sometimes simplification is like breathing.  The lungs have to be able   
   to expand as well as contract.   
      
   Jim Bromer   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca