Train a Perceptron to Learn the AND Gate from Scratch in Python

What will you Learn in this Post?

Neural Networks are function approximators. For example, in a supervised learning setting, given loads of inputs and loads of outputs in our training data, a neural network is capable of finding the hidden mapping between the those inputs and outputs. It can also, hopefully, generalize its acquired knowledge to the unknown and unseen data in the future.
One particular type of such functions is the boolean functions such as, AND, OR, NAND, NOR, XOR, … . In this post you will learn how to code a neural network (A perceptron) to learn a boolean function and you will see where a simple perceptron will fail!

The Representative Power of a Perceptron

So, a perceptron shown below, is capable of learning decision boundaries in a binary classification scenario. For example, if your inputs are 1-Dimensional or 2-Dimensional, we can actually visualize different parts of our perceptron. Below, in the 1-Dimensional case you can see how the input and the bias unit are linearly combined using the weights, resulting in Z. Then you can see that we apply the thresholding function on top of Z to create our binary classifier. This classifier outputs +1 for instances of one class, and -1 for the instances of the other class.

The sgn(Z) is our sign function where if Z>0, outputs +1 and if Z<0, it outputs -1. You notice how when our input data is 1-Dimensional, our Z is a 2-Dimensional figure namely a line: Z = w_1x_1 + w_0 . However, our decision boundary, where Z cuts through our input space (i.e., where Z=0), is a point and it is 1-Dimensional just like our input data. What about a 2-Dimensional input data? What will Z be then?

Our input data is 2-Dimensional but note that our Z is now a plane: Z = w_1x_1 + w_2x_2 + w_0. So, our input space is 2-Dimensional but our Z is 3-Dimensional! See what happens when Z cuts through our input space (x_1, x_2 plane)! The result is a line (again a 2-Dimensional shape) that is our decision boundary and it is where Z=0. Again by applying sgn(Z) on Z we can generate +1/-1 for the instances the 2 classes in our dataset!
Now, let’s refresh our minds on boolean functions and then see how we can learn such function using our perceptron 😉

Boolean Functions (A Reminder)

So, boolean functions, at least the main ones can be summarized, below:

Make no mistake! These are functions as well! We have inputs and we have outputs! So, for example the AND gate’s logic, from a human’s perspective is:
The output of the AND gate is +1 if and only if all of its inputs are positive (greater than 0), and the output is 0, if at least there is one non-positive input to the gate!
But a machine has no idea about this logic! We want to see whether we can train a machine learning algorithm, namely, a simple perceptron, to learn such gates.

Linearly Separable Gates and Perceptrons

If we look at the 2-Dimensional input space of boolean functions, and color positive examples with green and negatives with red, then we can visualise our training data like below. Note that, the yellow line is a decision boundary that can separate the 2 classes beautifully, and hopefully our neural network will be able to find such a boundary!

 

It is important to note that not all gates can be learned by a single perceptron! For instance the XOR gate, represented below, can NEVER EVER be learned by a single perceptron! Why?

In order to see why, I have to remind you that:
A single perceptron can learn any function, as long as the instances in the dataset are linearly separable, like AND, OR, NAND, and NOR!
Now what about the XOR?

In the image above, it is clear that the positive and negative examples are not linearly separable, so a perceptron can never find any line to separate them (The image on the left). However, if only we had a neural network with at least 2 layers of depth, we could separate these examples (The image on the right).

Teaching a Perceptron the AND Gate (CODE)

Of all the gates that are linearly separable, we have chosen the AND gate to code from scratch. First foremost, let’s take a look at our perceptron:

 

Now you notice that I have used the sigmoid() function, instead of a step function. The reason is that during back-propagation, we need all the functions that we have used during the forward pass to be differentiable! Now, the step function is not differentiable as it breaks at point 0 (when its input is 0)! So, I have used the sigmoid() function, which is differentiable all across and is continuous!
Now, we need to import these packages :
I am sure that as a Neural Network enthusiasts, you are familiar with the idea of the sigmoid() function and the binary-cross entropy function. We need to use them during the forward-pass. Before showing you the code, let me refresh your memory on the math: sigmoid(z) = \frac{1}{1 + e^{-z}} and as for the binary-cross entropy: 
E(y ,  y_{hat}}) = -y ln(y_{hat})-(1-y)ln(1-y_{hat})
Note that: The base of the logarithm is e. Meaning that what we have here is the Natural Logarithm!
Remember that this error function is nothing but a measure of difference between the output of the ANN, y_{hat}, and the ground-truth, y. So, the lower it is, the better (of course not to the point of over-fitting to the training data 😉 )!
Now let’s see the code.  We can code both of these 2 functions using 2 separate python functions in Python:
Now, we have to think ahead, right? So, during the back-propagation phase, we will need 2 things!
  1. The derivative of the Error w.r.t the output y_{hat}
  2. The derivative of the output y_{hat} w.r.t  Z (i.e., the derivative of the output of the sigmoid() function w.r.t the input of the sigmoid() function that is, Z)
Now, as a reminder:
\frac{dE}{{dy_{hat}}} = -\frac{y}{y_{hat}}+\frac{(1-y)}{1-y_{hat}}
And as for the sigmoid(z), or for short, sig(z):
\frac{dsig(z)}{dz} = sig(z)(1-sig(z))
This is not a neural network course, so I am not going to derive these here mathematically and step by step. Now, let us code the 2 functions in Python, whose job is to compute these derivatives:
Now, let’s build our dataset. We start wtih the AND gate. Remember: AND returns 1 if and only if all of its inputs are positive.
Let’s also visualise our dataset using the following code:

First, we need to initialise our weights, of which we need 3:
Also, let’s determine the number of epochs, and the learning rate:
Finally, the big monster is here. Let’s start the training phase:
Here is one of the outputs generated after 500 epochs:

And let’s also plot the error values that we have been recording, per epoch:

I Have 2 Challenges for You !!!

1) Can you modify the code in a way that the training data would correspond the XOR gate? 
The network should NEVER converge as there is no way for a line to separate linearly inseparable examples! The error function should look something nasty and non-converging like this:

The error is almost never going down! See if you can generate this!
2) See if by coding a 2-layer neural network you could learn the XOR gate?
Feel free to post your findings and outputs in the comments.
I do hope that you have enjoyed this post and it has been informative for you,
Regards,
MLDawn

Leave a Comment