... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 1,782 of 1,954

Milind Joshi to Sengly

Re: Data clustering need suggestions

03 Jul 08 12:44:57

   From: milind.a.joshi@gmail.com   
      
   On Jun 12, 7:41 pm, Sengly  wrote:   
   > Dear all,   
   >   
   > I would like you to share with me your experience on how should I   
   > handle my data. I have 1000 objects and I have a list of pair   
   > similarity of them. I would like to know how to cluster them into   
   > different groups according to their similarity?   
   >   
   > I have browse through various methods such as hierarchy, k-means,   
   > scaling dimension, etc. I really like k-means method but the problem   
   > is that I don't have points (and their coordinates) in space but   
   > rather their similarity.   
   >   
   > Any suggestion is appreciated.   
   >   
   > Kindest regards,   
   >   
   > Sengly   
   >   
      
   Since the number of your objects is relatively small, and you seem to   
   want to explore, here is a simple technique you might try first, and   
   the other mechanisms should beat it by reducing algorithm complexity,   
   creating 'better' clusters, or automatically finding out the   
   threshold, or all of the above.   
      
   Start off with each object is a cluster of 1, then cluster other   
   objects around it based on the similarity metrics you have by setting   
   some threshold for max distance from cluster centre (your initial   
   object) in order to be a member of that cluster, allowing objects to   
   participate in multiple clusters, and then choosing the y clusters   
   that have the 'max' objects in them or 'min' objects in them. You can   
   experiment with different threshold values to see how your clusters   
   grow or shrink in numbers or size.   
      
   If you had 3 objects A, B, C, and the inter-object similarity was a   
   scalar (the same approach could be used even if it was a vector,   
   except that you have to calculate vector distance before-hand)   
      
   AB=1   
   AC=2   
   BC=1   
      
   Threshold set to 1.   
      
   Clusters:   
      
   Around A: B   
   Around B: A, C   
   Around C: B   
      
   If you wanted the cluster with the most objects, you'd choose the   
   cluster formed around B.   
      
   Regards,   
   Milind   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]