home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,038 of 1,954   
   Dmitry A. Kazakov to Market Theory   
   Re: Algorithms/code for discovering patt   
   12 May 06 09:40:03   
   
   XPost: comp.programming   
   From: mailbox@dmitry-kazakov.de   
      
   On Fri, 12 May 2006 01:09:12 GMT, Market Theory wrote:   
      
   > pete wrote:   
   >> Market Theory wrote:   
   >>>   
   >>> A variant of the General Poster's problem.   
   >>>   
   >>> Discover patterns of consecutive symbols in a stream. The patterns are   
   >>> not known in advance, so this is a statistical anomaly detection   
   >>> problem. The stream consists of mostly random symbols over a known   
   >>> finite alphabet. The expected frequencies/probabilities of the random   
   >>> symbols may be known in advance, or inferred from the stream. The   
   >>> minimum and maximum length of patterns of interest are specified as   
   >>> parameters.   
   >>>   
   >>> eg   
   >>>   
   >>> wnckbvmnunixwjgemblsmznhauiwriewpklgnkwnbwnbwknqjowjunixjowrgnblsmknashwiw   
   >>>   
   >>> has the pattern "unix"   
   >>   
   >> ... and also "blsm"   
   >   
   > Well spotted! but what was your algorithm?   
      
   I bet it was a dictionary created from the sample. Then the frequency of   
   each token in the dictionary was compared with its expected frequency,   
   evaluated from the frequencies of individual symbols of each token.   
      
   A more realistic problem stating would also take into account correlations   
   between consequent symbols. So "blsm" would get a much lesser expected   
   frequency, because of four consonants in a row. (Assuming a natural   
   language source, rather than a cat walking on the keyboard.) Which would   
   position "blsm" far before "unix"! (:-))   
      
   --   
   Regards,   
   Dmitry A. Kazakov   
   http://www.dmitry-kazakov.de   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca