Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,129 of 1,954    |
|    Ted Dunning to mathlover    |
|    Re: Possible to Find the Clusters One by    |
|    22 Jul 06 05:19:04    |
      From: ted.dunning@gmail.com              mathlover wrote:       > All the usual clustering methods I am aware of (such as k-means, Fuzzy       > k-means, etc.) find all the clusters together. I mean, as you know,       > they consider some cost function minimizing which (by an iterative       > method for instance) finds all the clusters.              As a trivial answer, you can just run any of the clustering methods       that you are familiar with with a progressively larger number of       clusters using the old clusters as approximate seeds for the next set.       This will have some strange properties but will give approximately the       behavior you seek. Since clustering is generally O(m n), this is       reasonably efficient.              More interestingly, there are heirarchical clustering algorithms that       divide the data set into progressively finer divisions. Look for       dendograms, for instance.              This not really what you are asking for, however, since the neither of       these really find a single cluster in the sense you are asking.              If you know something about the distribution of the data that you are       looking at, you might be able to define something better. For       instance, you can define a cost function such that it must describe a       membership rule for a subset of your data and the distribution of       instances in that subset. It would be best fi the membership function       is based on a distance formulation. You would need to add cost for       poor description of the distribution of the data in the subset and for       the number of data instances outside the subset (or, inversely, use the       average log of the probability of the members of the subset plus the       size of the subset). There is probably an interesting mixture model       that could be used for this, but the key thing is that can't penalize       having a small subset too heavily, nor should you penalize a model for       an inability to describe the distribution of the excluded instances.       You can then progressively add subsets.              This is much closer to what you are looking for, but I have no idea if       it will actually work well on your data.              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca