Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 894 of 1,954    |
|    jbromer@isp.com to Scott Smith    |
|    Re: Algorithm/Theory help: Patterns, com    |
|    26 Jan 06 01:03:03    |
      Scott Smith wrote:       > I've been working on this interesting AI problem. I'm looking through a       > collection of variable length strings, comparing against a "master" string       > and looking for common features (i.e. "patterns"). So, as an example...       >       > Master string: "ABCDEF"       >       > ...ABCDEF... exact match       > ...ABDCEF... transposition       > ...ABDEF... deletion       > ...ABC...DEF... insertion       > ...DBEAFC... unordered       > ...A...BDCF... combination of the above       >       > These are the simplest types of patterns; there will probably be more.       >       > As you can imagine, any given comparison can come up with multiple possible       > match candidates, and that's where I'm having difficulty. I could invent an       > arbitrary set of rules for scoring potential matches, but I'm thinking that       > there might be some way to calculate a 'distance' between the master string       > and a potential pattern (something that doesn't involve arbitrary weights       > that I have to define).       >       > For example, the following seems reasonably obvious:       >       > - An exact match is closest (100%)       > - A single insertion scores lower       > - A double insertion scores lower still       > - ...etc       >       > Unfortunately it becomes less clear what the relative scoring would be       > between things like deletions and transpositions, much less combinations. I       > realize that any domain-specific knowledge would trump these general rules       > (for example, in a strand of DNA, a transposition might always score lower       > than a deletion), but I'm thinking that there may be some generic set of       > rules to start with.       >       > Any comments or suggestions would be greatly appreciated       >       > -Scott       >              This is an interesting problem and you seem to have a good       understanding of it, but I think you may be trying to avoid the       intrinsic complexity of the problem. Working on this problem has made       me realize (again) that I cannot rely on a simple dictionary lookup for       my own AI project. I really have to define a more complicated system       that can evaluate a number of different characteristics of a phrase       that is to be compared against the entries of this dictionary. (The       methods that can be used to evaluate syntactic transformations can also       be applied to conceptual relations since conceptual relations can be       stored as strings or fields, and so a working solution to this problem       is really significant.) However, while thinking about this problem       today, I realized that I could use a set of functions to evaluate and       score the various characteristics of the phrases that are to be entered       into the dictionary and use the signatures from these scores in the       comparisons. This means that a comparison might be made fairly quickly       even using relatively complicated functions to evaluate multiple       characteristics of a phrase. I have always rejected strong scoring       systems as being too arbitrary and too simplistic, and I thought that       making multiple evaluations of derived characteristics would produce so       many additional complications that it would complicate the problem       rather than solve it, but now I see that a more flexible and complex       scoring system could be both feasible and powerful. So although I am       going with the invention and definition of rules that you would like to       avoid, the usefulness of the method that I have been thinking of can be       increased greatly through the use of multiple scoring systems similar       to the one that you seem to be looking for.              Computer programs are complicated because they go through different       stages and they branch according to conditionals. Any means to       simplify problems like the one you are considering could be helpful,       but simplifying the problem too much could severely limit its       effectiveness. Although the use of multiple scoring systems would seem       to defeat the wish to simplify the problem because it can make the       problem more complicated, it still can be used to categorize data that       the system will encounter in useful ways, and that could constitute a       major step forward.              Sometimes simplification is like breathing. The lungs have to be able       to expand as well as contract.              Jim Bromer              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca