### Binary Classification from Scratch using Numpy

### The Decision Tree Algorithm: Fighting Over-Fitting Issue – Part(2): Reduced Error Pruning

### The Decision Tree Algorithm: Fighting Over-Fitting Issue – Part(1): What is over-fitting?

### The Decision Tree Algorithm: Information Gain

### The Decision Tree Algorithm: Entropy

In our last post, we introduced the idea of the decision trees (DTs) and you understood the big picture. Now it is time to get into some of the details. For example, how does a DT choose the best attribute from the dataset? There must be a way for the DT to compare the worth of each attribute and figure out which attribute can help us get to more pure sub-tables (i.e., more certainty). There is indeed a famous quantitative measure for this called Information Gain. But in order to understand it, we have to first learn about the concept of Entropy. As a reminder here is our training set:

## What is Entropy?

Entropy of a set of examples, can tell us how pure that set is! For example, if we have 2 sets of fruits: 1) 5 apples 5 oranges, and 2) 9 apples and 1 orange, we say that set 2 is much more pure (i.e., has much less entropy) than set 1 as it almost purely consists of apples. However, set 1 is a half-half situation and is so impure (i.e., has much more entropy) as neither apples nor oranges can dominate! Now, back to the adults' world and enough with fruits :-)

In a binary classification problem, such as the dataset above, we have 2 sets

### The Decision Tree Algorithm: A Gentle Introduction

