A Guide for building your own Face Detection & Face Recognition system

10 min readAug 27, 2019

Computer vision is one of the most interesting domains within the artificial intelligence arena. It was mainly inspired to automate tasks that were mimicking human vision. Advances in deep learning, cheaper computing and storage cost gave computer vision techniques a big leap forward and the last decade saw a surge in many such techniques. These techniques provide state of the art solution for tasks like face detection, face recognition, object detection, image classification, image-based recommendations.

We as humans almost instantly recognize a person by having a glance at their face. In order to perform such a task, within computer vision, the person identification is divided into face detection followed by recognition. Its common applications for identification and authentication leading to automation for attendance, access controls systems, suspicious person identification, missing person identification. Social media could not have been left untouched by these advances. Be it you posting a photo on Facebook, and it, in turn, giving you suggestions for tagging your friends or Google Photos, becoming your assistant in grouping and labeling pictures from your gallery. It even compiles short videos from your series of vacation pictures. Face detection is widely used in mobile applications for virtual augmentation of human faces with cat face, dog face, ornaments, etc as can be seen in Snapchat.

Let’s see the topic that we will see in this blog post.

1. Face Detection Techniques

Haar Cascades Classifier: The first machine learning-based cascading classifier to fulfill the requirement of fast implementation on low-power CPUs, such as cameras and phones.
Histogram of Oriented Gradients (HOG): Histogram of Oriented Gradient is used as features followed by linear Support Vector Machine classifier.
Multi-task Cascaded Convolutional Networks (MTCNN): Deep learning-based face detection technique.

2. Face Alignment: This is also called face normalization which helps to improve face recognition accuracy

3. Face Recognition: Recognize the known person from an image against a repository of known person photos.

Face Detection Techniques

Haar Cascades Classifier

This revolutionary method for face detection was presented by Paul Viola and Michael Jones in 2001. It is the first machine learning-based cascading classifier to fulfill the requirement of fast implementation on low-power CPUs, such as cameras and phones. The classifier is trained with many ‘positive’ (with face in it) images and the same size of ‘negative’(no face in it) images. The figure below shows how the Haar features used to extract the information from the window is used to train the model and later prediction. They are like CNN kernels. Each feature is a single value obtained by subtracting the sum of pixels under the white rectangle from the sum of pixels under the black rectangle.

Rectangle features A and B are two rectangle feature, C is Three rectangle features and D is Four rectangle features. [ **Source:** *From paper, Rapid Object Detection using a boosted cascade of simple features, Viols, and Jones, 2001 ]*

It concatenates several classifiers into one classifier. In extracting features, even with small window size, we will have a massive feature list and most of them are irrelevant. Adaboost is used to select the best features. At the end of each classifier, irrelevant features (reject sub-window) are removed and weights are updated. Cascading is continued until the required accuracy or error rate is achieved.

Graphical representation of the detection cascade. [ **Source:** From paper, Rapid Object Detection using a boosted cascade of simple features, Viols, and Jones, 2001 ]

Implementation in code using OpenCV

OpenCV is an open-source image and video processing library. Install opencv-python library it on your virtual environment by using the pip comm and.

To try it out, copy and paste the below code to Jupyter notebook and change the image file name ‘sample.jpg’ to your input file name.

import cv2

'''
Load the haarcascade classifier. Give and abolute path if xml file is not found.
'''
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

# Read the image and store it in array.
# Give and abolute path if image file is not found
img = cv2.imread('sample.jpg')

# Convert image to gray scale for prediction.
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

'''
  detectMultiScale function takes three input 
  ScaleFactor : How much the image size is reduced at each image scale. 
  MinNeighbors : How many ‘neighbors’ each candidate rectangle should have.
  MinSIze : The minimum object size. Default is (30, 30)
  
  This function returns the list of bounding box of each face found in image.
  
  For this example scaleFactor = 1.3 and MinNeighbors = 5
  Twik the parameter and see the results. 
'''
 
faces = face_cascade.detectMultiScale(gray, 1.3, 5)

# Draw bounding box to imput image.
for (x,y,w,h) in faces:
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)

cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows(

Remarks: User has to tune the three parameters: scaleFactor, minNeighbors, minSize for each image to reduce false negative. Hence it is not a fully intelligent system.

Histogram of Oriented Gradients (HOG)

This method is introduced by Dalal and Triggs in their paper Histogram of Oriented Gradients for Human detection in 2005. They used Histogram of Oriented Gradient features as image descriptor and trained Support Vector Machine (SVM) classifier to create highly accurate human detector (object classifiers). For any object, one can use HOG image descriptor for training and create object classifier. Here we will use it for face detection. The implementation pipeline is shown in the below figure.

HOG implementation pipeline. [ **Source:** From paper Histogram of Oriented Gradients for Human Detection, Dalal and Triggs, 2005 ]

Implementation in code

To implement this method we will use open source library face_recognition. It is a python wrapper of dlib library written in C++. This library is packed with pre-trained models of face detection and face recognition techniques. You can install the face_recognition library to your virtual environment by using the pip command.

To try it out, copy and paste the below code to Jupyter notebook and change the image file name ‘sample_2.jpg’ to your input file name

import face_recognition
import cv2

# load an image as an arrary 
image = face_recognition.load_image_file("sample_2.jpg")

# detect faces from input image.
face_locations = face_recognition.face_locations(image, model="hog")

for (top,right,bottom,left), landmarks in zip(face_locations,face_landmarks):
    cv2.rectangle(image,(left,bottom),(right,top),(255,0,0),2)

cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Remarks: Based on your application you can choose this detector. It is a good choice to use it for face recognition because it avoids half and covered faces. Due to the wrong prediction of landmarks, it is not possible to align face correctly.

face_recognition library also provides a deep learning-based pre-trained model for face detection called CNN. To use it, you can replace ‘hog’ keyword in detection to ‘cnn’. This detector is fast and accurate if you use GPU for prediction. You can try it on your own if have access to GPU.

MTCNN

This method is proposed by Kaipeng Zhang et al. in their paper, Joint Face Detection and Alignment using Multi-task cascaded convolution networks. In their deep learning approaches, they used cascading of Convolution Neural Networks of three stages to train and predict face and five landmark locations which help in face alignment. In the first step, the image is passed to the Proposal Network (P-Net). Using regression technique it predicts bounding boxes. After that, non-maximum suppression (NMS) is used to merge highly overlapped candidates. In the second step, previously output candidates are the input to another CNN, called Refine Network (R-Net). This CNN will also reject a large number of false candidates and predict more accurate bounding boxes and apply NMS to remove overlapped boxes. In the third step, Output Network (O-Net), is similar to the second stage, but this network will also predict five facial landmark’s position.

Generally, if you have used any deep learning models, they take fixed-size input but MTCNN takes any size of the color image as input.

Implementation in code

For this, we will use another open-source library MTCNN. You can install the mtcnn library to your virtual environment by using the pip command. To try it out, copy and paste the below code to Jupyter notebook and change the image file name ‘sample_2.jpg’ to your input file name.

from mtcnn.mtcnn import MTCNN
import face_recognition
import cv2

# initialise the detector class.
detector = MTCNN()

# load an image as an array
image = face_recognition.load_image_file("sample_2.jpg")

# detect faces from input image.
face_locations = detector.detect_faces(image)

# draw bounding box and five facial landmarks of detected face
for face in zip(face_locations):
    (x, y, w, h) = face[0]['box']
    landmarks = face[0]['keypoints']
    cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,0),2)
    for key, point in landmarks.items():
        cv2.circle(image, point, 2, (255, 0, 0), 6)

cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Remarks: This model predicts half, covered and side faces.

Face alignment

Face alignment is used to improve the accuracy of face recognition. It is a normalized technique which outputs the face-centered to the image, rotated such that line joining the center of two eyes is parallel to the horizontal line and it resizes the faces to identical scale. Implementation in code Before trying this sample code make sure you have installed numpy in your virtual environment if it is not installed, install it using the pip comman d. To try it out, copy and paste the below code to Jupyter notebook. It will give an output of aligned and centered face for a given sample input face image.

import face_recognition
import cv2
import numpy as np

# load image and find face locations.
image = face_recognition.load_image_file("sample.jpg")
face_locations = face_recognition.face_locations(image, model="hog")

# detect 68-landmarks from image. This includes left eye, right eye, lips, eye brows, nose and chins
face_landmarks = face_recognition.face_landmarks(image)

'''
Let's find and angle of the face. First calculate 
the center of left and right eye by using eye landmarks.
'''
leftEyePts = face_landmarks[0]['left_eye']
rightEyePts = face_landmarks[0]['right_eye']

leftEyeCenter = np.array(leftEyePts).mean(axis=0).astype("int")
rightEyeCenter = np.array(rightEyePts).mean(axis=0).astype("int")

leftEyeCenter = (leftEyeCenter[0],leftEyeCenter[1])
rightEyeCenter = (rightEyeCenter[0],rightEyeCenter[1])

# draw the circle at centers and line connecting to them
cv2.circle(image, leftEyeCenter, 2, (255, 0, 0), 10)
cv2.circle(image, rightEyeCenter, 2, (255, 0, 0), 10)
cv2.line(image, leftEyeCenter, rightEyeCenter, (255,0,0), 10)

# find and angle of line by using slop of the line.
dY = rightEyeCenter[1] - leftEyeCenter[1]
dX = rightEyeCenter[0] - leftEyeCenter[0]
angle = np.degrees(np.arctan2(dY, dX))

# to get the face at the center of the image,
# set desired left eye location. Right eye location 
# will be found out by using left eye location.
# this location is in percentage.
desiredLeftEye=(0.35, 0.35)
#Set the croped image(face) size after rotaion.
desiredFaceWidth = 128
desiredFaceHeight = 128

desiredRightEyeX = 1.0 - desiredLeftEye[0]
 
# determine the scale of the new resulting image by taking
# the ratio of the distance between eyes in the *current*
# image to the ratio of distance between eyes in the
# *desired* image
dist = np.sqrt((dX ** 2) + (dY ** 2))
desiredDist = (desiredRightEyeX - desiredLeftEye[0])
desiredDist *= desiredFaceWidth
scale = desiredDist / dist

# compute center (x, y)-coordinates (i.e., the median point)
# between the two eyes in the input image
eyesCenter = ((leftEyeCenter[0] + rightEyeCenter[0]) // 2,
    (leftEyeCenter[1] + rightEyeCenter[1]) // 2)

# grab the rotation matrix for rotating and scaling the face
M = cv2.getRotationMatrix2D(eyesCenter, angle, scale)

# update the translation component of the matrix
tX = desiredFaceWidth * 0.5
tY = desiredFaceHeight * desiredLeftEye[1]
M[0, 2] += (tX - eyesCenter[0])
M[1, 2] += (tY - eyesCenter[1])

# apply the affine transformation
(w, h) = (desiredFaceWidth, desiredFaceHeight)
(y2,x2,y1,x1) = face_locations[0] 
        
output = cv2.warpAffine(image, M, (w, h),
    flags=cv2.INTER_CUBIC)

cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Face recognition

In this step, we will use cropped and aligned image of the test face, instead of the whole image. We have used face_recognition library as it provides a pre-trained model based on ResNet architecture. This model takes face image and landmark as an input and gives 128 feature vectors in the output. To find the same person in a new image, find the euclidean distance between each faces in the new image with the known person image. The face with the minimum distance is the matching face.

To try it out, copy and paste the below code to Jupyter notebook. It will give distance value for a given sample input face image from a reference image

import face_recognition
import cv2
import time
from imutils import paths
import os
import numpy as np

knownFace = "known.jpg"
image = face_recognition.load_image_file(knownFace)
face_locations = face_recognition.face_locations(image, model="hog")
face_landmarks = face_recognition.face_landmarks(image)

# after alignment we have to resize the image so we have to give 
# width and height of the output aligned face.
(top,right,bottom,left) = face_locations[0] 
desiredWidth = (right-left) 
desiredHeight = (bottom-top)

# use the code snippet for face alignment (from the previous section) 
# and create a function alignFace. set the desiredWidth and desiredHeight
# to face width and height
align_f = alignFace(image, face_locations, face_landmarks, desiredWidth, desiredHeight)

# calculate face encodings of align face. It is array of 128 length.
known_face_encoding = face_recognition.face_encodings(align_f, num_jitters=10)[0]

unknownFace = "unknown.jpg"
image = face_recognition.load_image_file(unknownFace)
face_locations = face_recognition.face_locations(image, model="hog")
face_landmarks = face_recognition.face_landmarks(image)

# after alignment we have to resize the image so we have to give 
# width and height of the output aligned face.
(top,right,bottom,left) = face_locations[0] 
desiredWidth = (right-left) 
desiredHeight = (bottom-top)

# use the code snippet for face alignment (from the previous section) 
# and create a function alignFace. set the desiredWidth and desiredHeight
# to face width and height
align_f = alignFace(image, face_locations, face_landmarks, desiredWidth, desiredHeight)

# calculate face encodings of align face. It is array of 128 length.
unknown_face_encoding = face_recognition.face_encodings(align_f, num_jitters=10)[0]

# calculate the distance between known and unknown face. 
# distance range is 0 to 1. If two faces are maching then value
# of distance is near to zeor else it is much away from zero or near
# to one.
distance = face_recognition.face_distance([known_face_encoding], unknown_face_encoding)[0]
print("Distance : {}".format(distance))

One can try to match faces with different expressions, plain glasses, and goggles, and see how the distance vary.

Summary

In this blog, Team Intellica has explained the various implementation techniques of face detection and face recognition using Artificial Intelligence Technology. This methodology can detect and recognize faces with high accuracy in real-time.

If you’re looking for similar tech competence or want to integrate face detection & recognition solution with your existing system; feel free to reach out at info@intellica.ai