Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,133 of 1,954    |
|    MrElbi to mathlover    |
|    Re: Possible to Find the Clusters One by    |
|    22 Jul 06 12:51:29    |
      From: bruno.lemarchand@gmail.com              mathlover wrote:       > Ted Dunning wrote:       > > mathlover wrote:       > > > All the usual clustering methods I am aware of (such as k-means, Fuzzy       > > > k-means, etc.) find all the clusters together. I mean, as you know,       > > > they consider some cost function minimizing which (by an iterative       > > > method for instance) finds all the clusters.       > >       > > As a trivial answer, you can just run any of the clustering methods       > > that you are familiar with with a progressively larger number of       > > clusters using the old clusters as approximate seeds for the next set.       > > This will have some strange properties but will give approximately the       > > behavior you seek. Since clustering is generally O(m n), this is       > > reasonably efficient.       > >       > > More interestingly, there are heirarchical clustering algorithms that       > > divide the data set into progressively finer divisions. Look for       > > dendograms, for instance.       > >       > > This not really what you are asking for, however, since the neither of       > > these really find a single cluster in the sense you are asking.       > >       > > If you know something about the distribution of the data that you are       > > looking at, you might be able to define something better. For       > > instance, you can define a cost function such that it must describe a       > > membership rule for a subset of your data and the distribution of       > > instances in that subset. It would be best fi the membership function       > > is based on a distance formulation. You would need to add cost for       > > poor description of the distribution of the data in the subset and for       > > the number of data instances outside the subset (or, inversely, use the       > > average log of the probability of the members of the subset plus the       > > size of the subset). There is probably an interesting mixture model       > > that could be used for this, but the key thing is that can't penalize       > > having a small subset too heavily, nor should you penalize a model for       > > an inability to describe the distribution of the excluded instances.       > > You can then progressively add subsets.       > >       > > This is much closer to what you are looking for, but I have no idea if       > > it will actually work well on your data.       > >       >       > Dear Ted Dunning,       >       > I think I might have badly phrased my question. Indeed, in the problem       > I am working on simple k-means clustering attains satisfying quality.       > Of course, I know the total number of clusters (so I can use the       > k-means clustering).       >       > However, because of the very large size of the problem it takes a lot       > of time to find all the clusters (I mean using k-means). But, the point       > is that I don't need all the clusters the k-means finds in the later       > stages of my problem. Actually, I even know exactly how many of them I       > need.       >       > Thus, I am looking for a method that gives in its output clusters near       > those k-means gives, but it can find them one at a time (not like       > k-means that finds all of them together) so that I can quit after       > finding the exact number of clusters I need.       >       > Best regards,       > mathlover.       >              Hi,              You should search informations about the BFR algorithm. It is a       partitional algorithm which is used to do clustering on very huge       datasets: the datas are treated by pages, which seems to be close to       what you are looking for.              Regards,              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca