home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 535 of 1,954   
   NickName to All   
   ID3 entropy calculation question with th   
   30 Dec 04 09:20:23   
   
   From: dadada@rock.com   
      
   Happy holidays to you all!   
      
   I have a question regarding entropy calculation for a decision tree   
   using ID3 algorithm.   
      
   To refresh your memory, the sample data (training data) looks liks   
   this:   
   Day	Outlook	Temp.	HumidityWind	Play Tennis   
   D1	Sunny	Hot	High	Weak	No   
   D2	Sunny	Hot	High	Strong	No   
   D3	OvercastHot	High	Weak	Yes   
   D4 	Rain 	Mild	High	Weak	Yes   
   D5	Rain	Cool	Normal	Weak	Yes   
   D6	Rain	Cool	Normal	Strong	No   
   D7	OvercastCool	Normal	Weak	Yes   
   D8	Sunny	Mild	High	Weak	No   
   D9	Sunny	Cold	Normal	Weak	Yes   
   D10	Rain	Mild	Normal	Strong	Yes   
   D11	Sunny	Mild	Normal	Strong	Yes   
   D12	OvercastMild	High	Strong	Yes   
   D13	OvercastHot	Normal	Weak	Yes   
   D14	Rain	Mild	High	Strong	No   
      
   I believe I have some rudimentary understanding of the information   
   theory   
   entropy(S) = -(p1*log(p1)+...+pn*log(pn))   
   information gain (attribute, set) = ... (can't display formula here)   
      
   No problem with the first "pass" for entropies (hence gain) such as   
   HUMIDITYhigh = (- (3/7) * log2 (3/7) - (4/7) * log2 (4/7) )=0.985   
   HUMIDITYnormal = (- (6/7) * log2 (6/7) - (1/7) * log2 (1/7) )=0.592   
   WindWeak: 0.811   
   WindStrong: 1   
   ...   
      
   And Gain4Outlook = 0.246 (the highest among the four attributes).   
      
   So, we pick OUTLOOK as the first attribute, OUTLOOK has 3 values of   
   Sunny, Overcast and Rain, let's start with sunny,   
      
   Entropy for the [D1,D2,D8,D9,D11] = entropySet4Sunny = 0.970 (same as   
   lecture material).   
      
   However, at the second "pass", entropy calculation "threw" me off in   
   the sense that my result is different from ID3 lectures by several   
   different institutions (they entropies for the second "pass" are the   
   same, so, I must be the one who's wrong), so, the question is, what   
   went wrong?   
      
   Here's my calculation,   
   HUMIDITYhigh2 = ( - (0/5) log2 (0/5) - (3/5) * log2 (3/5) )   
   = ( - 0 - (3/5) * log2 (3/5) )   
   = 0.442   
   HUMIDITYnormal2 = ( - (2/5) * log2 (2/5) - (0/5) log2 (0/5) )   
   = ( - (2/5) * log2 (2/5) - 0 )   
   = 0.528   
      
   BUT "lecture material" reads as   
   Gain(Ssunny , Humidity)=0.970-(3/5)0.0 - 2/5(0.0) = 0.970   
   implying that HUMIDITYhigh2 is 0 and HUMIDITYnormal2 is 0 as well.   
   How did they do their entropy calculation here or what did I do wrong?   
      
   Or is there a rule that goes like if one "sub element" is zero then the   
   entropy is zero like   
   At the first "pass"   
   OUTLOOKovercast = ( - (4/4) * log2 (4/4) - 0  )   
   = 0 (the second "sub element" is zero)   
   ?   
      
   Another question,   
   the tree looks like this   
   OUTLOOK   
   /   |     \   
   sunny overcast rain   
   /    Y (stop)  \   
   HUMIDITY            WIND   
   ...                 ...   
      
   for the OUTLOOK --> overcast branch, it stops there because (4+,0-)? or   
   because entrophy at that point is 0 or ?   
   I'm just getting started.   
      
   TIA.   
      
   D   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca