home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,129 of 1,954   
   Ted Dunning to mathlover   
   Re: Possible to Find the Clusters One by   
   22 Jul 06 05:19:04   
   
   From: ted.dunning@gmail.com   
      
   mathlover wrote:   
   > All the usual clustering methods I am aware of (such as k-means, Fuzzy   
   > k-means, etc.) find all the clusters together. I mean, as you know,   
   > they consider some cost function minimizing which (by an iterative   
   > method for instance) finds all the clusters.   
      
   As a trivial answer, you can just run any of the clustering methods   
   that you are familiar with with a progressively larger number of   
   clusters using the old clusters as approximate seeds for the next set.   
   This will have some strange properties but will give approximately the   
   behavior you seek.  Since clustering is generally O(m n), this is   
   reasonably efficient.   
      
   More interestingly, there are heirarchical clustering algorithms that   
   divide the data set into progressively finer divisions.  Look for   
   dendograms, for instance.   
      
   This not really what you are asking for, however, since the neither of   
   these really find a single cluster in the sense you are asking.   
      
   If you know something about the distribution of the data that you are   
   looking at, you might be able to define something better.  For   
   instance, you can define a cost function such that it must describe a   
   membership rule for a subset of your data and the distribution of   
   instances in that subset.  It would be best fi the membership function   
   is based on a distance formulation.  You would need to add cost for   
   poor description of the distribution of the data in the subset and for   
   the number of data instances outside the subset (or, inversely, use the   
   average log of the probability of the members of the subset  plus the   
   size of the subset).  There is probably an interesting mixture model   
   that could be used for this, but the key thing is that can't penalize   
   having a small subset too heavily, nor should you penalize a model for   
   an inability to describe the distribution of the  excluded instances.   
   You can then progressively add subsets.   
      
   This is much closer to what you are looking for, but I have no idea if   
   it will actually work well on your data.   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca