... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 908 of 1,954

Ted Dunning to All

Re: Algorithm/Theory help: Patterns, com

07 Feb 06 00:20:18

   From: ted.dunning@gmail.com   
      
   I haven't spent enough time to really get through what you are talking   
   about with your partial ordering of strings, but it looks like you are   
   roughly doing the following:   
      
   a) defining an edit distance tuple in a standard way, but not reducing   
   it to a single scalar distance measure as would be typically done   
      
   b) defining a partial order on edit distance tuples based on total   
   domination of all elements of the tuple.   
      
   c) looking at the distribution of the number of points in this partial   
   ordering.   
      
   You claim that this avoids arbitrary assumptions.  I would differ in   
   that assessment and claim that several points in this process   
   incorporate assumptions structurally rather than explicitly.   
      
   In particular, you are presuming that all edits are equally important.   
   A good example of where this can break down is normal text.  Substition   
   of whitespace is often completely unimportant.  This means that editing   
   any non-empty string of spaces and tabs to any other string has nearly   
   zero impact on the meaning of the string.  Likewise, substitution in a   
   Unicode text of any of the glyphs that look like "A" shouldn't be   
   considered very important.  Substituting other characters can be vastly   
   more important.   
      
   Another presumption is that the distribution that you are looking at is   
   sufficiently constant as to have interesting aggregate properties.   
   This assumption is also often violated, although coming up with an   
   example is harder in the few seconds I am willing to spend on the   
   matter.   
      
   It IS true that looking at distributions of distances (however you care   
   to define them) is a very interesting task.  In fact, that is exactly   
   why spectral clustering is interesting.  This may be the   
   (pragmatically) universal truth that you are after, but it isn't what   
   you.   
      
   It is specifically NOT true that you can define a universally useful   
   metric in the specific way that you describe.  Nor is it true that your   
   method is assumption-free.   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]