home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,133 of 1,954   
   MrElbi to mathlover   
   Re: Possible to Find the Clusters One by   
   22 Jul 06 12:51:29   
   
   From: bruno.lemarchand@gmail.com   
      
   mathlover wrote:   
   > Ted Dunning wrote:   
   > > mathlover wrote:   
   > > > All the usual clustering methods I am aware of (such as k-means, Fuzzy   
   > > > k-means, etc.) find all the clusters together. I mean, as you know,   
   > > > they consider some cost function minimizing which (by an iterative   
   > > > method for instance) finds all the clusters.   
   > >   
   > > As a trivial answer, you can just run any of the clustering methods   
   > > that you are familiar with with a progressively larger number of   
   > > clusters using the old clusters as approximate seeds for the next set.   
   > > This will have some strange properties but will give approximately the   
   > > behavior you seek.  Since clustering is generally O(m n), this is   
   > > reasonably efficient.   
   > >   
   > > More interestingly, there are heirarchical clustering algorithms that   
   > > divide the data set into progressively finer divisions.  Look for   
   > > dendograms, for instance.   
   > >   
   > > This not really what you are asking for, however, since the neither of   
   > > these really find a single cluster in the sense you are asking.   
   > >   
   > > If you know something about the distribution of the data that you are   
   > > looking at, you might be able to define something better.  For   
   > > instance, you can define a cost function such that it must describe a   
   > > membership rule for a subset of your data and the distribution of   
   > > instances in that subset.  It would be best fi the membership function   
   > > is based on a distance formulation.  You would need to add cost for   
   > > poor description of the distribution of the data in the subset and for   
   > > the number of data instances outside the subset (or, inversely, use the   
   > > average log of the probability of the members of the subset  plus the   
   > > size of the subset).  There is probably an interesting mixture model   
   > > that could be used for this, but the key thing is that can't penalize   
   > > having a small subset too heavily, nor should you penalize a model for   
   > > an inability to describe the distribution of the  excluded instances.   
   > > You can then progressively add subsets.   
   > >   
   > > This is much closer to what you are looking for, but I have no idea if   
   > > it will actually work well on your data.   
   > >   
   >   
   > Dear Ted Dunning,   
   >   
   > I think I might have badly phrased my question. Indeed, in the problem   
   > I am working on simple k-means clustering attains satisfying quality.   
   > Of course, I know the total number of clusters (so I can use the   
   > k-means clustering).   
   >   
   > However, because of the very large size of the problem it takes a lot   
   > of time to find all the clusters (I mean using k-means). But, the point   
   > is that I don't need all the clusters the k-means finds in the later   
   > stages of my problem. Actually, I even know exactly how many of them I   
   > need.   
   >   
   > Thus, I am looking for a method that gives in its output clusters near   
   > those k-means gives, but it can find them one at a time (not like   
   > k-means that finds all of them together) so that I can quit after   
   > finding the exact number of clusters I need.   
   >   
   > Best regards,   
   > mathlover.   
   >   
      
   Hi,   
      
   You should search informations about the BFR algorithm. It is a   
   partitional algorithm which is used to do clustering on very huge   
   datasets: the datas are treated by pages, which seems to be close to   
   what you are looking for.   
      
   Regards,   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca