Tutorial on CNN Through an Example

Published in

The Startup

5 min readJan 15, 2021

Convolutional Neural Networks (CNN) are deep neural models that are typically used to solve computer vision problems. These networks are composed of an input layer, an output layer, and many hidden layers, some of which are convolutional thus the name.

Some real-world examples

Recognizing stop signs from camera input for self-driving cars
Recognizing animals in hunting cameras
Generating meaningful search results for Google Images
Tagging people in Facebook images

At its most basic form CNN is used to classify images. CNN processes these images by transforming them into arrays. To be precise, images are transformed into 2-D arrays (matrices) where each entry in the array (matrix) represents the color of the corresponding pixel.

For a black and white image, each pixel’s value lies between 0 and 255. This value corresponds to the grayscale value of that pixel.

For a colored image, this process gets a bit more complex. The colored images are represented in a 3-D array (matrix) where the third dimension is the dimension of length 3 for the basic red, blue, green colors. Here also, each pixel’s value lies between 0 and 255.

Problems with CNN

Just like any other model, CNN isn’t perfect. Here is an example to demonstrate that.

Consider the following image:

https://images.app.goo.gl/WAxeiyk6jiuuByZz7 — Source

Is it a duck or a rabbit ? Do you see a duck or a rabbit ?

Of course, there is no right answer for this question. This image is designed in such a way to create confusion. In fact, this image is historically significant because it is one of the earliest examples of optical illusion. It is an excellent example to show fallibility of CNN for image recognition.

Flowchart of the code

Into the code

DATA SET USED- Here, we have taken the MNIST data set.

Modified National Institute of Standards and Technology(MNIST) is a well-known dataset used in computer vision. It is composed of images that are handwritten digits (0–9), split into a training set of 50,000 images and a test set of 10,000 where each image is of 28 x 28 pixels in width and height.

First step is to import the necessary libraries and packages. For building CNN, we use ‘keras’ here.

Create the train data and test data

Test data: Used for testing the model that how are model has been trained.
Train data: Used to train our model.

→While proceeding further, img_rows and img_cols are used as the image dimensions.

→ In mnist dataset, it is 28 and 28. We also need to check the data format i.e. ‘channels_first’ or ‘channels_last’.

→ In CNN, we can normalize data before hands such that large terms of the calculations can be reduced to smaller terms. Like, we can normalize the x_train and x_test data by dividing it with 255.

Checking data-format:

Description of the output classes:

→Since output of the model can comprise of any of the digits between 0 to 9.so, we need 10 classes in output.

→ To make output for 10 classes, use keras.utils.to_categorical function, which will provide with the 10 columns.

→ Out of these 10 columns only one value will be one and rest 9 will be zero and this one value of the output will denote the class of the digit.

Now, dataset is ready so let’s move towards the CNNmodel :

Explanation of the working of each layer in CNN model:

→layer1 is Conv2d layer which convolves the image using 32 filters each of size (3*3).
→layer2 is again a Conv2D layer which is also used to convolve the image and is using 64 filters each of size (3*3).
→layer3 is MaxPooling2D layer which picks the max value out of a matrix of size (3*3).
→layer4 is showing Dropout at a rate of 0.5.
→layer5 is flattening the output obtained from layer4 and this flatten output is passed to layer6.
→layer6 is a hidden layer of neural network containing 250 neurons.
→layer7 is the output layer having 10 neurons for 10 classes of output that is using the softmax function.

Calling compile and fit function:

→Firstly, we made an object of the model as shown in the above given lines, where [inpx] is the input in the model and layer7 is the output of the model. →We compiled the model using required optimizer, loss function and printed the accuracy and at the last model.fit was called along with parameters like x_train(means image vectors), y_train(means the label), number of epochs and the batch size.

→Using fit function x_train, y_train dataset is fed to model in a particular batch size.

Evaluate function:
→model.evaluate provides the score for the test data i.e. provided the test data to the model.

→ Now, model will predict class of the data and predicted class will be matched with y_test label to give us the accuracy.

OUTPUT-

We get 99.06% accuracy by using CNN(Convolutional Neural Network) with functional model.

Thankyou for reading. Hope you liked it!

Written by Simran Kaur, Parimala Palle & Ashritha Row