home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 146 of 1,954   
   Eray Ozkural exa to meiyi   
   Re: association, apriori and frequent it   
   30 Oct 03 21:34:15   
   
   XPost: comp.soft-sys.matlab   
   From: erayo@bilkent.edu.tr   
      
   zhong_meiyi@hotmail.com (meiyi) wrote in   
    message news:...   
   > Hi all,   
   >   
   > I need to generate association rules from a data set. this data set is   
   > really huge (~5 million entries), with 40+ attributes, and these   
   > attributes have values that are either nominal or numerical. I've   
   > tried out several softwares, but I still can't find one that works   
   > with my data set.   
   >   
   > I've tried weka, but it does not accept numerical data. other then   
   > that, its a pretty good software. I've also tried ARMADA, but it only   
   > accepts data of string type.   
   >   
   > I've also tried Christian Borgelt's apriori program but the attributes   
   > are not taken into account. I could modify my data so that it is   
   > possible to distinguish the attribute [e.g. if attribute duration has   
   > values such as 0, 124, etc, i could change them to d_0, d_124, etc, to   
   > show that these values belong to attribute duration.] However, this   
   > isn't very efficient because i have 5 million entries and i would have   
   > to manually change everyone of them.   
   >   
   > I've tried Christian Borgelt's eclat program to generate frequent set   
   > and the attributes are also not taken into account.   
   >   
   > Can anyone recommend any other software? (i need them to be free   
   > because i'm just a poor student doing research). Or if anyone has a   
   > solution using the software mentioned above, please email me and throw   
   > me a lifeline.   
   >   
   > Any help is deeply appreciated. Thanks!   
   >   
   > meiyi   
      
   Hi Meiyi,   
      
   The frequent itemset problem takes as input a set of transactions   
   which is in the ordinary case not numerical data. Each transaction is   
   a set of items. Therefore, all your "attributes" must be boolean.   
      
   There is a numerical association mining problem, to solve that you can   
   quantize your numerical attributes. That is each meaningful range of   
   values may be interpreted as an item in the transaction set.   
      
   Regards,   
      
   --   
   Eray Ozkural   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca