Using a Convolutional Neural Network to Create a Program That Detects Face Masks

Today, everybody is wearing face masks. This article will explain how to create a convolutional neural network to detect if a person or multiple people are wearing face masks or not. The program will read in live video footage right from your mobile phone.

Creating a Neural Network

In this section, we will look at how to feed in training images and create a neural network that is capable of detecting face masks.

We will begin by importing the libraries that will be used.

The essential libraries for this project are NumPy, OpenCV & Tensorflow 2.0. All the modules used will be explained when we get to that part.

The face mask detection algorithm will be built on a Convolutional Neural Network. Convolutional Neural Networks or CNNs, take in training data and apply ‘filters’ in order to detect different patterns in images that can be used to make predictions based off of. For our model, we will use a public dataset from Kaggle.

The dataset can be found at:

Before we start writing more code, let us analyze the dataset. There are 10,000 training images, 5,000 with masks and 5,000 without masks. It is imperative that there is an abundance of training data when working with CNNs, and 10,000 is more than adequate. There are 992 more testing images and 800 validation images.

Something to note is that the image size varies for each image, and so choosing optimum dimensions to reduce each and every image to requires trial & error.

Once you have downloaded the dataset, copy and paste the directory to the dataset and assign that to the variable DATADIR.

Our algorithm will have binary predictions, either 0, indicating no face mask, or 1, indicating the presence of a face mask. Hence, the category list, CAT, will have two categories: ‘Without Mask’ & ‘With Mask’.

Now, we need to actually load in our data and to do that, we will create a function that navigates through the folder containing the training data and reads in the images along with their label (‘Without Mask’ or ‘With Mask’).

The images that will be read in will also be converted into grayscale, as the colour of the face mask isn’t really important.

We have also used Error Handling to ignore the images that may cause an issue while being inputted. Essentially, the OpenCV method .imread(), takes in images and outputs the pixel dimensions of the images. All of our images will be saved in a list called data. We’re also going to use the .shuffle() method from the random library, to mix up the order (so that it isn’t strictly ‘Without Mask’ images first & then ‘With Mask’ images. This could create an issue with the training of the network). After shuffling, we’ll split up the data list into two independent lists, X & y. X will contain the feature variables or in this case the image dimensions while y will contain the target variable.

Finally, we will convert our X list into an array and make it 4 dimensional. This is a convention while using CNNs. -1 as the first argument of the .reshape() method signifies all images (so instead of -1, you could also pass in len(x)). It is good practice to normalize data. Since we know that the maximum value for pixel data is 255 and the minimum value is 0, we can divide all our data by 255 in order to normalize it on a scale of 0 to 1. The last two lines of the code consist of saving our X data and y data so that we can import it whenever we like, without having to preprocess the data all over again.

Now, it is time to build the model.

Our convolutional network will use 5 layers: Dense, Dropout, Conv2D, MaxPool2D & Flatten.

The Conv2D layer creates filter matrices based on the argument provided (in this case, 3x3 filters will be used in each Conv2D layer). Essentially, it will analyze the images, in blocks of 3x3, and will output the sum of the Hadamard product. The following image shows how the Conv2D layer works. The starting values of the filters are generated randomly but are updated after backpropagation.

Following that, the MaxPool2D layer outputs the largest value from the convolved image data. After the image analysis is done, the Flatten layer makes the data into 1 dimension, so that it can be operated on further. The Dense & Dropout layers are used for further operations and the final output is a single neuron that outputs either a 0, signifying ‘Without Mask’ or a 1, signifying ‘With Mask’.

The loss function used to evaluate the model is Binary Crossentropy. The EarlyStopping callback is used to prevent overfitting of the model. Now that the model is compiled and fit, it is time to evaluate the model on new data.

992/992 [==============================] - 1s 1ms/sample - loss: 0.0278 - accuracy: 0.9869

As we can see, the accuracy of our model is 98.7% and it takes 1 millisecond to analyze each image. Now let’s evaluate the model based on validation data.

800/800 [==============================] - 1s 1ms/sample - loss: 0.0282 - accuracy: 0.9875

As we can see, the accuracy is a resounding 98.8%.

Use‘model’), to save the CNN model.

Now that we have built, validated, and saved our CNN model, it is time to write a script to feed the model live footage from a mobile phone.

Feeding Footage

The script can be run in a separate .py file. Be sure to save the file in the same directory as the saved model.

Like before, we will begin by importing all the libraries.

The load_model() module will load in the saved model. Since we trained our neural network on 80x80 images, we will do the same with the images extracted from the live footage.

We will use a Frontal Facing Haar Cascade file to detect the face. A Haar Cascade file provides the program with features that will help identify the face. We will use this to identify the face, take the image of strictly the face, and send that to the CNN to detect whether there is a face mask on the face or not. The Haar Cascade file for Frontal Face detection can be downloaded from:

The program will require an IP address pertaining to the mobile phone to be entered, we will look at how to get the IP address and what application to use on the phone, later. The argument for the CascadeClassifier() method is the directory of the Haar Cascade file on your computer.

The following script essentially detects a face, saves an image of that face, converts it to grayscale, normalizes it, makes it into a 4D array, and sends it to the Convolutional Neural Network. A blue box is generated around the face if there is no mask, and a green box is generated if there is a mask.

The argument for the waitKey() method is the number of milliseconds before the screen is updated. Lowering this number makes the program smoother but takes more computational power.

After running the program, click the Escape key to exit.

To output footage from your phone, download the IP Webcam application. Once downloaded, open the app and scroll down to the Service Control section. Just below that is the Start server option. Clicking on that will start your phone’s camera. The footage seen on your phone’s screen is the footage that will be sent to the CNN. On the screen, you should see an IP IPv4 address. Copy that address.

16. Studying in the 11th.