Harnessing Artificial Intelligence for Handwriting & Object Recognition in Digital Business

5 min readJul 31, 2019

Artificial Intelligence (AI) technology is disrupting every aspect of modern digital lifestyle. From AI-powered chatbots to intelligent home and office automation solutions, this technology is reducing the workload of humans by making devices smarter. Automation has become synonymous to efficiency and scalability within business enterprises and convenience in routine lives.

One of the most promising sub-domains associated with AI is object detection technology. Facial recognition technology is being used for security purposes around the globe. At the same time, innovative use cases for AI technology with an intention to speed up manual activities are cropping up with each passing day. Object detection is an advanced deep learning application that can be used for text and handwriting recognition.

Real-life AI Use Case for Handwriting Recognition

Handwriting recognition via deep learning has plenty of uses in modern life. Artificial intelligence can be used to train a machine in recognizing alphabets, numerals and other handwritten text within emails, bank cheques, official documents or images.

At Intellica.AI, we understood the growing need for an AI-powered handwritten text recognition tool and developed a Gujarati Numeral Recognition system. The system can be used for identifying final bill amount, processing bank cheques and financial transactions, evaluating numeric entries in forms and sort letters inside post offices.

Why Gujarati?

Being one of the most popular regional languages in India with 46 million speakers around the globe, Gujarati is the preferred language for many industrialists and businessmen. Gujarati evolved from Devanagari script and ancient Prakrit language.

Developing a Multi-faceted Solution That Promises Accuracy & Versatility

A multi-layered sustain-forward system known as convolution neural network was harnessed for creating an accurate solution that can understand 2D shapes with higher accuracy compared to artificial neural networks.

Through Gujarati handwritten numeral recognition model, we devised an advanced and accurate system that can understand Gujarati numerals from 0 to 9.

The model was trained to recognize and understand Gujarati numerals written by hand by analyzing an image using a convolution neural network.

For accurately evaluating the images and coming up with the right outcome, Convolution Neural Network (CNN) was applied to images containing handwritten numerals. The model can be evolved and trained to even understand handwritten alphabets and letters.

Handwritten Numeral Recognition Model Development Process

Before we began the development, our team collected 5700 images of handwritten Gujarati numerals. Each image was 128x128 pixels in size giving us a consistent repository to train the AI model. There were a total of 10 classes to predict (10 digits- 0 to 9). Once the data scope was defined, our developers followed the below-mentioned steps:

Data Preparation

We have generated the custom dataset for training. The data that we used for training has 570 images for each class from 0 to 9. We have used OpenCV for the image operations and Keras as the deep learning library to implement the neural network architecture.

Preprocessing

Several pre-processing operations were performed to standardize the collected images. This included resizing, grayscale conversions and normalization.

# function to read image into required format (image read, resize, convert to 1D-array)
def preprocess(image_name):
       # read images
       image = cv2.imread(image_name,0)
       
       # preprocess image by resizing
       image = cv2.resize(image, (128,128))
       
       # normalize inputs from 0–128 to 0–1
       image = image/128 # or .flatten()
       
       return image

Data Splitting for Training, Testing & Validation

The available dataset was split into three parts for evaluation and comparison of different models. 5700 images were split into three parts for testing, training and validation. 3648 images were used for training the model, 1140 images were used for testing and 912 images were used for validating the model.

# train & test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)

# split the train and the validation set for the fitting
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, stratify=y_train, random_state=1)

Creating a CNN Network

Convolution Neural Networks are better than standard Artificial Neural Networks when it comes to extraction and utilization of features data. CNN improves the model’s capability to identify 2D shape with accuracy irrespective of translation, scaling and other distortion. We used a modern CNN implementation approach consisting of convolution layers, pooling layers, dropout layers and dense layers.

# create model
def HanwrittenDigitRecognitionModel():
	#start neural network
	model = Sequential()
	
	# add first convolutional layer
	model.add(Conv2D(64, (5, 5), input_shape=(128, 128, 1), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	
	# add second convolutional layer
	model.add(Conv2D(16, (3, 3), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
	
	# add flatten layer
	model.add(Flatten())
	
	# add fully connected layer
	model.add(Dense(128, activation='relu'))
	
	# add fully connected layer
	model.add(Dropout(0.5))
	model.add(Dense(50, activation='relu'))
	
	# add dense layer
	model.add(Dense(num_classes, activation='softmax'))
	
	return model

Compiling the Model

For compiling this model, categorical cross-entropy was used owing to multiclass classification problem. The process consisted of the following:

1. Adadelta Optimizer: A robust extension of Adagrad, Adadelta adapts learning rates based on a moving gradient update window instead of accumulating past gradients.

2. Metrics Accuracy: Metrics Accuracy is used for monitoring training and testing steps. The below-mentioned example is used for checking what fraction of images are accurately classified:

fit_generator is used for fitting the data into the developed model. The other factors include steps_per_epochs which tells us the number of times the model will execute for the training data.

Model Evaluation

For the purpose of evaluating the development model, we made use of a cross-validation technique for computing the accuracy and validation loss. The results of the evaluation process were as below:

Accuracy: 92.89% (Higher the better)
Loss: 0.24 (Lower the better)

Conclusion

Creating a handwriting recognition model using convolution neural network that can work offline for recognizing handwritten digits was a challenging but innovative project. We were successful in training the model to achieve 92% accuracy. Using 2D structure of the image for representation of the content by flattening pixels into a single vector of 16384 units, we were able to achieve the desired results.