Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 908 of 1,954    |
|    Ted Dunning to All    |
|    Re: Algorithm/Theory help: Patterns, com    |
|    07 Feb 06 00:20:18    |
      From: ted.dunning@gmail.com              I haven't spent enough time to really get through what you are talking       about with your partial ordering of strings, but it looks like you are       roughly doing the following:              a) defining an edit distance tuple in a standard way, but not reducing       it to a single scalar distance measure as would be typically done              b) defining a partial order on edit distance tuples based on total       domination of all elements of the tuple.              c) looking at the distribution of the number of points in this partial       ordering.              You claim that this avoids arbitrary assumptions. I would differ in       that assessment and claim that several points in this process       incorporate assumptions structurally rather than explicitly.              In particular, you are presuming that all edits are equally important.       A good example of where this can break down is normal text. Substition       of whitespace is often completely unimportant. This means that editing       any non-empty string of spaces and tabs to any other string has nearly       zero impact on the meaning of the string. Likewise, substitution in a       Unicode text of any of the glyphs that look like "A" shouldn't be       considered very important. Substituting other characters can be vastly       more important.              Another presumption is that the distribution that you are looking at is       sufficiently constant as to have interesting aggregate properties.       This assumption is also often violated, although coming up with an       example is harder in the few seconds I am willing to spend on the       matter.              It IS true that looking at distributions of distances (however you care       to define them) is a very interesting task. In fact, that is exactly       why spectral clustering is interesting. This may be the       (pragmatically) universal truth that you are after, but it isn't what       you.              It is specifically NOT true that you can define a universally useful       metric in the specific way that you describe. Nor is it true that your       method is assumption-free.              [ comp.ai is moderated. To submit, just post and be patient, or if ]       [ that fails mail your article to |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca