Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 902 of 1,954    |
|    Marina Sapir to Scott Smith    |
|    Re: Algorithm/Theory help: Patterns, com    |
|    01 Feb 06 01:19:18    |
      From: marina@sapir.us              Scott Smith wrote:       > I've been working on this interesting AI problem. I'm looking through a       > collection of variable length strings, comparing against a "master" string       > and looking for common features (i.e. "patterns"). So, as an example...       >              >       > Unfortunately it becomes less clear what the relative scoring would be       > between things like deletions and transpositions, much less combinations. I       > realize that any domain-specific knowledge would trump these general rules       > (for example, in a strand of DNA, a transposition might always score lower       > than a deletion), but I'm thinking that there may be some generic set of       > rules to start with.       >       > Any comments or suggestions would be greatly appreciated       >       > -Scott       >              I like your idea to avoid arbitrary scoring! You can do multidimensinal       ranking which designed exactly to avoid setting any arbitrary weights       and like this.       I used it in my paper              Sapir M., Verbel D., Kotsianti A., and Saidi O. 2005. Live Logic:       Method for approximate knowledge discovery and decision making. In:       Dominik Slezak et al (eds), Rough sets, fuzzy sets, data mining, and       granular computing. LNAI 3641, p532 -540.              with reference on the K. Wittkowski's work,              Wittkowski, K.M., Lee, E, Nussbaum, R., Chamian, F.N., Krueger,       J.G. 2004. Combining several ordinal measures in clinical studies.       Stat. Med., 23, pp. 1579 -- 1592.              Essentialy, it can work like this. You list all the patterns of change:       like permutations, insertion and so on. Suppose, you have n different       patterns. Then, you code each string by a new string of length n, where       on i-th position you put the number of i-th patterns of change is in       this string. Then you compare pairs of code-strings. Suppose, you have       code-strings a and b, String a is "larger" than string b, is every       position in a is equal or larger than corresponding position in the       code-string b, with a strict inequality in some position. Now, for each       code-string you calculate what Wittkowski calls mU-stat: difference       between the number of code-string larger and the number of code strings       lower than the given one. This mU-stat is a single characteristics of       the code-strings, which allows to rank them and compare by distance       from the original string.              Marina Sapir       http://sapir.us              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca