home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 902 of 1,954   
   Marina Sapir to Scott Smith   
   Re: Algorithm/Theory help: Patterns, com   
   01 Feb 06 01:19:18   
   
   From: marina@sapir.us   
      
   Scott Smith wrote:   
   > I've been working on this interesting AI problem. I'm looking through a   
   > collection of variable length strings, comparing against a "master" string   
   > and looking for common features (i.e. "patterns"). So, as an example...   
   >   
      
   >   
   > Unfortunately it becomes less clear what the relative scoring would be   
   > between things like deletions and transpositions, much less combinations. I   
   > realize that any domain-specific knowledge would trump these general rules   
   > (for example, in a strand of DNA, a transposition might always score lower   
   > than a deletion), but I'm thinking that there may be some generic set of   
   > rules to start with.   
   >   
   > Any comments or suggestions would be greatly appreciated   
   >   
   > -Scott   
   >   
      
   I like your idea to avoid arbitrary scoring! You can do multidimensinal   
   ranking which designed exactly to avoid setting any arbitrary weights   
   and like this.   
   I used it in my paper   
      
   Sapir M., Verbel D., Kotsianti A., and Saidi O. 2005. Live Logic:   
   Method for approximate knowledge discovery and decision making. In:   
   Dominik Slezak et al (eds), Rough sets, fuzzy sets, data mining, and   
   granular computing. LNAI 3641, p532 -540.   
      
   with reference on the K. Wittkowski's work,   
      
   Wittkowski, K.M., Lee, E, Nussbaum, R., Chamian, F.N., Krueger,   
   J.G. 2004.  Combining several ordinal measures in clinical studies.   
   Stat. Med., 23, pp. 1579 -- 1592.   
      
   Essentialy, it can work like this. You list all the patterns of change:   
   like permutations, insertion and so on. Suppose, you have n different   
   patterns. Then, you code each string by a new string of length n, where   
   on i-th position you put the number of i-th patterns of change is in   
   this string. Then you compare pairs of code-strings. Suppose, you have   
   code-strings a and b,  String a is "larger" than string b, is every   
   position in a is equal or larger than corresponding position in the   
   code-string b, with a strict inequality in some position. Now, for each   
   code-string you calculate what Wittkowski calls mU-stat: difference   
   between the number of code-string larger and the number of code strings   
   lower than the given one. This  mU-stat is a single characteristics of   
   the code-strings, which allows to rank them and compare by distance   
   from the original string.   
      
   Marina Sapir   
   http://sapir.us   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca