Algorithms

The Decision Tree Algorithm: Entropy

Leave a Comment / Algorithms / By mehran@mldawn.com

Which Attribute to Choose? (Part1)

    In our last post,&amp;nbsp; we introduced the idea of the decision trees (DTs) and you understood the big picture. Now it is time to get into some of the details. For example, how does a DT choose the best attribute from the dataset? There must be a way for the DT to compare the worth of each attribute and figure out which attribute can help us get to more pure sub-tables (i.e., more certainty). There is indeed a famous quantitative measure for this called Information Gain. But in order to understand it, we have to first learn about the concept of Entropy. As a reminder here is our training set:&lt;/p&gt;      
                                    &lt;img width="1024" height="580" src="https://www.mldawn.com/wp-content/uploads/2019/02/which-attribute-to-choose-1024x580.png" alt="" srcset="https://www.mldawn.com/wp-content/uploads/2019/02/which-attribute-to-choose-1024x580.png 1024w, https://www.mldawn.com/wp-content/uploads/2019/02/which-attribute-to-choose-300x170.png 300w, https://www.mldawn.com/wp-content/uploads/2019/02/which-attribute-to-choose-768x435.png 768w, https://www.mldawn.com/wp-content/uploads/2019/02/which-attribute-to-choose.png 1092w" sizes="(max-width: 1024px) 100vw, 1024px" /&gt;                                            
        &lt;h2&gt;What is Entropy?&lt;/h2&gt;       
    Entropy of a set of examples, can tell us how pure that set is! For example, if we have 2 sets of fruits: 1) 5 apples 5 oranges, and 2) 9 apples and 1 orange, we say that set 2 is much more pure (i.e., has much less entropy) than set 1 as it almost purely consists of apples. However, set 1 is a half-half situation and is so impure (i.e., has much more entropy) as neither apples nor oranges can dominate! Now, back to the adults' world and enough with fruits :-)

In our last post,&nbsp; we introduced the idea of the decision trees (DTs) and you understood the big picture. Now it is time to get into some of the details. For example, how does a DT choose the best attribute from the dataset? There must be a way for the DT to compare the worth of each attribute and figure out which attribute can help us get to more pure sub-tables (i.e., more certainty). There is indeed a famous quantitative measure for this called Information Gain. But in order to understand it, we have to first learn about the concept of Entropy. As a reminder here is our training set:</p>

<h2>What is Entropy?</h2>

Entropy of a set of examples, can tell us how pure that set is! For example, if we have 2 sets of fruits: 1) 5 apples 5 oranges, and 2) 9 apples and 1 orange, we say that set 2 is much more pure (i.e., has much less entropy) than set 1 as it almost purely consists of apples. However, set 1 is a half-half situation and is so impure (i.e., has much more entropy) as neither apples nor oranges can dominate! Now, back to the adults' world and enough with fruits :-)

In a binary classification problem, such as the dataset above, we have 2 sets of examples 1) Positive (Yes) and 2) Negative (No), and this is how we compute the entropy of our dataset: Please note that the base of the Logarithm is 2. As for the proportions, let’s …

The Decision Tree Algorithm: Entropy Read More »