Image Processing using CNN: A beginners guide (2024)

This article was published as a part of theData Science Blogathon


The various deep learning methods use data to train neural network algorithms to do a variety of machine learning tasks, such as the classification of different classes of objects. Convolutional neural networks are deep learning algorithms that are very powerful for the analysis of images. This article will explain to you how to construct, train and evaluate convolutional neural networks.

You will also learn how to improve their ability to learn from data, and how to interpret the results of the training. Deep Learning has various applications like image processing, natural language processing, etc. It is also used in Medical Science, Media & Entertainment, Autonomous Cars, etc.

Image Processing using CNN: A beginners guide (1)
Image Processing using CNN: A beginners guide (2)

What is CNN?

CNN is a powerful algorithm for image processing. These algorithms are currently the best algorithms we have for the automated processing of images. Many companies use these algorithms to do things like identifying the objects in an image.

Images contain data of RGB combination. Matplotlib can be used to import an image into memory from a file. The computer doesn’t see an image, all it sees is an array of numbers.Color images are stored in 3-dimensional arrays. The first two dimensions correspond to the height and width of the image (the number of pixels). The last dimension corresponds to the red, green, and blue colors present in each pixel.

Three Layers of CNN

Convolutional Neural Networks specialized for applications in image & video recognition. CNN is mainly used in image analysis tasks like Image recognition, Object detection & Segmentation.

There are three types of layers in Convolutional Neural Networks:

1) Convolutional Layer: In a typical neural network each input neuron is connected to the next hidden layer. In CNN, only a small region of the input layer neurons connect to the neuron hidden layer.

2) Pooling Layer: The pooling layer is used to reduce the dimensionality of the feature map. There will be multiple activation & pooling layers inside the hidden layer of the CNN.

3) Fully-Connected layer:Fully Connected Layers form the last few layers in the network. The input to the fully connected layer is the output from the final Pooling or Convolutional Layer, which is flattened and then fed into the fully connected layer.

Image Processing using CNN: A beginners guide (3)

Source: Google Images

MNIST Dataset

Inthis article, we will be working on object recognition in image data using the MNIST dataset for handwritten digit recognition.

The MNIST dataset consists of images of digits from a variety of scanned documents. Each image is a 28X28 pixel square. In this dataset 60,000 images are used to train the model and 10,000 images are used to test the model. There are 10 digits (0 to 9) or 10 classes to predict.

Image Processing using CNN: A beginners guide (4)

Source: Google Images

Loading the MNIST Dataset

Install the TensorFlow library and import the dataset as a train and test dataset.

Plot the sample output of the image

!pip install tensorflowfrom keras.datasets import mnistimport matplotlib.pyplot as plt(X_train,y_train), (X_test, y_test)= mnist.load_data()plt.subplot()plt.imshow(X_train[9], cmap=plt.get_cmap('gray'))


Image Processing using CNN: A beginners guide (5)

Deep Learning Model with Multi-Layer Perceptrons using MNIST

In this model, we will build a simple neural network model with a single hidden layer for the MNIST dataset for handwritten digit recognition.

A perceptron is a single neuron model that is the basic building block to larger neural networks. The multi-layer perceptron consists of three layers i.e. input layer, hidden layer and output layer. The hidden layer is not visible to the outside world. Only the input layer and output layer is visible. For all DL models data must be numeric in nature.

Step-1: Import key libraries

import numpy as npfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.utils import np_utils

Step-2: Reshape the data

Each image is 28X28 size, so there are 784 pixels. So, the output layer has 10 outputs, the hidden layer has 784 neurons and the input layer has 784 inputs. The dataset is then converted into float datatype.

number_pix=X_train.shape[1]*X_train.shape[2] X_train=X_train.reshape(X_train.shape[0], number_pix).astype('float32')X_test=X_test.reshape(X_test.shape[0], number_pix).astype('float32')

Step-3: Normalize the data

NN models usually require scaled data. In this code snippet, the data is normalized from (0-255) to (0-1) and the target variable is one-hot encoded for further analysis. The target variable has a total of 10 classes (0-9)

X_train=X_train/255X_test=X_test/255y_train= np_utils.to_categorical(y_train)y_test= np_utils.to_categorical(y_test)num_classes=y_train.shape[1]print(num_classes)



Now, we will create an NN_model function and compile the same

Step-4: Define the model function

def nn_model(): model=Sequential() model.add(Dense(number_pix, input_dim=number_pix, activation='relu')) mode.add(Dense(num_classes, activation='softmax')) model.compile(loss='categorical_crossentropy', optimiser='Adam', metrics=['accuracy']) return model

There are two layers one is a hidden layer with activation function ReLu and the other one is the output layer using the softmax function.

Step-5: Run the model

model=nn_model(), y_train, validation_data=(X_test,y_test),epochs=10, batch_size=200, verbose=2)score= model.evaluate(X_test, y_test, verbose=0)print('The error is: %.2f%%'%(100-score[1]*100))


Epoch 1/10300/300 - 11s - loss: 0.2778 - accuracy: 0.9216 - val_loss: 0.1397 - val_accuracy: 0.9604Epoch 2/10300/300 - 2s - loss: 0.1121 - accuracy: 0.9675 - val_loss: 0.0977 - val_accuracy: 0.9692Epoch 3/10300/300 - 2s - loss: 0.0726 - accuracy: 0.9790 - val_loss: 0.0750 - val_accuracy: 0.9778Epoch 4/10300/300 - 2s - loss: 0.0513 - accuracy: 0.9851 - val_loss: 0.0656 - val_accuracy: 0.9796Epoch 5/10300/300 - 2s - loss: 0.0376 - accuracy: 0.9892 - val_loss: 0.0717 - val_accuracy: 0.9773Epoch 6/10300/300 - 2s - loss: 0.0269 - accuracy: 0.9928 - val_loss: 0.0637 - val_accuracy: 0.9797Epoch 7/10300/300 - 2s - loss: 0.0208 - accuracy: 0.9948 - val_loss: 0.0600 - val_accuracy: 0.9824Epoch 8/10300/300 - 2s - loss: 0.0153 - accuracy: 0.9962 - val_loss: 0.0581 - val_accuracy: 0.9815Epoch 9/10300/300 - 2s - loss: 0.0111 - accuracy: 0.9976 - val_loss: 0.0631 - val_accuracy: 0.9807Epoch 10/10300/300 - 2s - loss: 0.0082 - accuracy: 0.9985 - val_loss: 0.0609 - val_accuracy: 0.9828The error is: 1.72%

In the model results, it is visible as the number of epochs increases the accuracy improves. The error is 1.72%, lower the error higher the accuracy of the model.

Convolutional Neural Network Model using MNIST

In this section, we will create simple CNN models for MNIST that demonstrate Convolutional layers, Pooling layers & Dropout layers.

Step-1: Import all the necessary libraries

import numpy as npfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.utils import np_utilsfrom keras.layers import Dropoutfrom keras.layers import Flattenfrom keras.layers.convolutional import Conv2Dfrom keras.layers.convolutional import MaxPooling2D

Step-2: Set the seed for reproducibility and load the data MNIST data

seed=10np.random.seed(seed)(X_train,y_train), (X_test, y_test)= mnist.load_data()

Step-3: Convert the data into float values

X_train=X_train.reshape(X_train.shape[0], 1,28,28).astype('float32')X_test=X_test.reshape(X_test.shape[0], 1,28,28).astype('float32')

Step-4: Normalize the data

X_train=X_train/255X_test=X_test/255y_train= np_utils.to_categorical(y_train)y_test= np_utils.to_categorical(y_test)num_classes=y_train.shape[1]print(num_classes)

A classical CNN architecture looks like as shown below:

Image Processing using CNN: A beginners guide (6)

Source: Google Images

Output Layer
(10 outputs)
Hidden Layer
(128 neurons)
Flatten Layer
Dropout Layer
Max Pooling Layer
Convolutional Layer
32 maps, 5×5
Visible Layer

The first hidden layer is a Convolutional layer call Convolution2D. It has 32 feature maps with size 5×5 and with a rectifier function. This is the input layer. Next is the pooling layer that takes the maximum value called MaxPooling2D. In this model, it is configured as a 2×2 pool size.

In the dropout layer regularization happens. It is set to randomly exclude 20% of the neurons in the layer to avoid overfitting. The fifth layer is the flattened layer that converts the 2D matrix data into a vector called Flatten. It allows the output to be fully processed by a standard fully connected layer.

Next, the fully connected layer with 128 neurons and rectifier activation function is used. Finally, the output layer has 10 neurons for the 10 classes and a softmax activation function to output probability-like predictions for each class.

Step-5: Run the model

def cnn_model(): model=Sequential() model.add(Conv2D(32,5,5, padding='same',input_shape=(1,28,28), activation='relu')) model.add(MaxPooling2D(pool_size=(2,2), padding='same')) model.add(Dropout(0.2)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dense(num_classes, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model
model=cnn_model(), y_train, validation_data=(X_test,y_test),epochs=10, batch_size=200, verbose=2)score= model.evaluate(X_test, y_test, verbose=0)print('The error is: %.2f%%'%(100-score[1]*100))


Epoch 1/10300/300 - 2s - loss: 0.7825 - accuracy: 0.7637 - val_loss: 0.3071 - val_accuracy: 0.9069Epoch 2/10300/300 - 1s - loss: 0.3505 - accuracy: 0.8908 - val_loss: 0.2192 - val_accuracy: 0.9336Epoch 3/10300/300 - 1s - loss: 0.2768 - accuracy: 0.9126 - val_loss: 0.1771 - val_accuracy: 0.9426Epoch 4/10300/300 - 1s - loss: 0.2392 - accuracy: 0.9251 - val_loss: 0.1508 - val_accuracy: 0.9537Epoch 5/10300/300 - 1s - loss: 0.2164 - accuracy: 0.9325 - val_loss: 0.1423 - val_accuracy: 0.9546Epoch 6/10300/300 - 1s - loss: 0.1997 - accuracy: 0.9380 - val_loss: 0.1279 - val_accuracy: 0.9607Epoch 7/10300/300 - 1s - loss: 0.1856 - accuracy: 0.9415 - val_loss: 0.1179 - val_accuracy: 0.9632Epoch 8/10300/300 - 1s - loss: 0.1777 - accuracy: 0.9433 - val_loss: 0.1119 - val_accuracy: 0.9642Epoch 9/10300/300 - 1s - loss: 0.1689 - accuracy: 0.9469 - val_loss: 0.1093 - val_accuracy: 0.9667Epoch 10/10300/300 - 1s - loss: 0.1605 - accuracy: 0.9493 - val_loss: 0.1053 - val_accuracy: 0.9659The error is: 3.41%

In the model results, it is visible as the number of epochs increases the accuracy improves. The error is 3.41%, lower the error higher the accuracy of the model.

Frequently Asked Questions

Q1. What is CNN in image processing?

A. CNN stands for Convolutional Neural Network and is a type of deep learning algorithm used for analyzing and processing images. It performs a series of mathematical operations such as convolutions and pooling on an image to extract relevant features. CNNs are widely used in image processing tasks such as object detection, image segmentation, and classification, and have shown impressive results in various real-world applications.

Q2. How is the working of CNN?

A. CNNs apply convolutional layers to input images, use activation functions to introduce non-linearity, reduce dimensionality with pooling layers, and produce final output with fully connected layers. During training, weights are adjusted to minimize output differences. Once trained, CNNs can classify new images and are widely used in image processing tasks.

Q3. Why do we use CNN?

A. We use CNNs (Convolutional Neural Networks) in image processing because they can effectively extract features from images and learn to recognize patterns, making them well-suited for tasks such as object detection, image segmentation, and classification. They have shown impressive results in various real-world applications and have revolutionized the field of computer vision.

I hope you enjoyed reading, and feel free to use my code to try it out for your purposes. Also, if there is any feedback on code or just the blog post, feel free to reach out to me at [emailprotected]

The media shown in this article on Image Processing using CNN are not owned by Analytics Vidhya and are used at the Author’s discretion.

blogathonCNNimage processingImage Processing using CNN


Mohit Tripathi01 May 2023

AdvancedAlgorithmDeep LearningImageImage Analysis

Image Processing using CNN: A beginners guide (2024)
Top Articles
Latest Posts
Article information

Author: Carmelo Roob

Last Updated:

Views: 5701

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Carmelo Roob

Birthday: 1995-01-09

Address: Apt. 915 481 Sipes Cliff, New Gonzalobury, CO 80176

Phone: +6773780339780

Job: Sales Executive

Hobby: Gaming, Jogging, Rugby, Video gaming, Handball, Ice skating, Web surfing

Introduction: My name is Carmelo Roob, I am a modern, handsome, delightful, comfortable, attractive, vast, good person who loves writing and wants to share my knowledge and understanding with you.