Projects | Dat Nguyen

Facial Expression Recognition with Deep Learning

Fri, 11 Dec 2020 00:00:00 +0000

Motivation

Human emotion detection has a critical part in the interpersonal relationships. Emotions can generally be extracted from speech, hands and gestures of the body as well as through facial expressions. Being able to understand human emotions could improve the quality of the communications between human and machine. Furthermore, there is a wide range of industries that could benefit from emotion recognition such as healthcare, automotive, gaming, and many more.

While there are about 27 different human emotions, in this project, we plan to work with labeled data sets with seven distinct human emotions: happiness, sadness, fear, anger, surprise, disgust, and neutral. We aim to implement a two-step model: first, localize human faces in an image and second, recognize emotion expressed in those faces.

Part 1

Our first data set (’Dataset 1’), which is provided and maintained by Dataturks, has about 400 images with a bit more than 1000 faces. The data is available as a JSON file with 2 main components, an URL address of the image, and the face labels and bounding boxes of that image. We extracted those information from the JSON file, made an annotate function, and showed the images with the correct bounding boxes for the faces.

Images from Dataset 1

We selected a Faster R-CNN based model, the X101-FPN, available from Facebook’s Detectron2 package, Model Zoo for face detection task. Because this pre-trained model is developed for object recognition tasks with more than 30 different labels, we leverage the CNN layers for feature detections and then retrain the last few layers of the model using Dataset 1. The result is as follows:

Face Detection with Pre-trained Model

Part 2

For facial expression recognition task, we used the FER2013 dataset (‘Dataset 2’). The data consists of 48x48 pixel grayscale images of human faces. The faces have been processed to be centered in the image and occupies about the same amount of space in each image. Each face is labeled as one of seven emotions.

Images from Dataset 2

We started with KNN and MLP as our baseline models with accuracies of 31.5% and 22.8% respectively. A random classifier would have 14.3% accuracy (7 labels) so the baseline models, even though having better accuracies, are still not good enough. This is where a CNN based model shines. We used 2 different CNN architectures, one simple CNN model with only 3 convolutional layers and a ResNet based CNN model. Both models were trained on the FER data set using Adam optimizer with learning rate of 0.001 and the loss is calculated using cross-entropy loss. The architecture of the two models are presented below:

CNN Architecture

The simple CNN model achieved a 54.8% accuracy, which already beat the 2 baseline models. The best accuracy we obtained is 62.8% from the ResNet based CNN model. The pre-trained ResNet model we are using as our initialization is trained on more than a million images from the ImageNet database. Therefore, it has already learned very good kernels in its convolutional layers and is able to extract useful features from the facial expression images without further training. That might be the reason why it outperforms our simple CNN model after the first training epoch.

Final thoughts

While the Faster R-CNN based model and the ResNet based CNN model achieved a very good result, there are instances where the models overdetect the bounding boxes or fail to capture the true emotions.

An example of overdetecting faces with multiple emotions

Facial expression recognition is a very challenging task, even for a human being. A trained human can be excellent at hiding true emotions and there could be no way to find out through facial expressions. Furthermore, a human face could show multiple expressions at the same time. With that said, machine learning is getting closer at matching human’s level, which could be exciting and terrifying at the same time.

Jeopardy! Web Application

Tue, 01 Dec 2020 00:00:00 +0000

Our application is modeled after the show, Jeopardy!, which features a quiz competition in which contestants are presented with general knowledge clues in the form of answers and must phrase their responses in the form of questions. The questions can come from any category and in order to win, contestants must rack up the most amount of money. Having been aired for decades, this immensely popular show has gained worldwide recognition and continues to be an entertaining show for everyone to enjoy.

Our application serves two main purposes. Primarily, our application is meant to be a fun trivia site for fans who want an interactive Jeopardy! experience. Additionally, our application can also be used by prospective contestants who want to sharpen their trivia skills and gain insights into the patterns of questions and answers before appearing on the show.

The motivation behind this idea comes from the fact that we are Jeopardy! fans and there is currently no comprehensive site where fans can play the games as well as view statistics about the show. In recent history, some Jeopardy! winners have successfully used computer models in order to effectively and efficiently train for their appearances on the show, but for contestants without programming backgrounds, this type of training is not feasible. With our application, any prospective contestant is able to leverage analytical insights in order to best prepare for the show.

The underlying technologies we used to create our application are as follows:

Python - Web Scraping: We scraped two of our datasets relating to contestants from J! Archive using Python. The first dataset we scraped contained data about winners and contestants per episode. The second dataset contained data about each contestant themselves, unrelated to the show.
Amazon Warehouse Services (AWS) Relational Database Services (RDS): Our database is maintained and hosted on AWS.
Oracle SQL Developer: After establishing our database, we connected it to Oracle SQL Developer, which we used to load and query our data. Our application is also directly connected to this.
Tableau: We integrated Tableau in order to create data visualizations that better communicated certain statistics on our Contestants and Questions pages.
Node.js: We used Node.js as the JavaScript runtime environment that executes JavaScript code outside a web browser. It allows us to run scripts server-side to produce dynamic web page content before the page is sent to the user’s web browser.
Vue.js: Vue is a model-view, front-end JavaScript framework. We used Vue to build the user interfaces for our trivia site.

Installation Instruction

Server:
- Our server connects to Oracle. In order to run the application, please make sure you have Oracle Instant Client and SDK installed.
- Once it’s installed, update the path to the client in the routes.js file at the very top within oracledb.initOracleClient({ libDir: '' })
- Use “npm start” to run server from within server folder (after npm install)
Client:
- Use “npm run serve – –port 3000” to run client from within client folder(after npm install)
- If this fails, first delete the “node_modules” folder in the client folder and run npm i. Then try again.

Preview

A Preview of the Play Page

We hope you enjoy it!

Study of Car Accidents in the United States

Thu, 05 Nov 2020 00:00:00 +0000

We chose to work with a set of United States car accident data from 2016 - 2020. The data set consists of ~3.5 million samples, each of which has 49 features.

Exploratory Data Analysis

The goal of this analysis is to assess the cleanliness and consistency of the data in order to engineer features that can be used to train a predictor of accident severity. To this end, we systematically explore the features and their relationships with other features using both summary statistics and visualizations. Please see the Google Colab notebook for further EDA.

Machine Learning

Under the scope of this project, we chose to perform our severity prediction for Pennsylvania state. Given that this is a multi-class classification problem, we have several algorithms to consider. Let’s look at each of them and see how they perform with the PA subset.

ML Method	Accuracy	Training Time
`Logistic Regression`	80.6%	3min 41s
`K-Nearest Neighbor`	75.9%	0min 11s
`Decision Tree`	86.6%	0min 5s
`Random Forest`	89.7%	27min 37s
`AdaBoost`	79.7%	1min 48s
`XGBoost`	92.0%	57min 22s
`LightGBM`	91.3%	0min 55s

XGBoost is the winner here. XGBoost is a gradient boosting method, one of the best algorithms for classification problems. It handles large sized datasets very well with good execution speed. It can automatically handle missing values and apply regularized boosting method, thus it’s less prone to overfitting. It also does pruning for you. As a matter of fact, XGBoost is a frequent winner in a lot of Kaggle competitions.

Come very close is LightGBM. Light GBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithm. It is capable of performing equally good with large datasets with a significant reduction in training time as compared to XGBoost.

Deep Learning

We attempt to predict the level of severity using the accident descriptions. We will use a Long Short Term Memory (LSTM) recurrent neural net (RNN) model for this task.

LSTM Introduction

The accident description is a form of sequential information. Intuitively, RNNs have memory that “captures” what have been calculated so far i.e. what happened in the last sentence will impact what happened in the next sentense. We then use these information to predict the “label” of the accident severity, i.e. category level 1 through 4. The architecture of the recurrent neural networks is below:

RNN Architecture

A chunk of neural network, “A”, looks at some input x(t) and outputs a value h(t). X(t) could be a word or a sentence, and h(t) is the probability of the next word or next sentence.

So how does that relate to our task of predicting accident severity given the description?

We input a word or words of the description to the model
At the end of each description, we give the model the label i.e. the severity category value
RNNs, by passing input from last output, are able to retain information and able to leverage all these information at the end to make a severity prediction given a new accident description.

This works well for short descriptions but when we have to deal with a long descriptions, there will be a long term dependency problem. Therefore, we generally do not use vanilla RNNs, and we use Long Short Term Memory instead.

LSTM is a type of RNNs that can solve the long term dependency problem.

Different LSTM

For this classification problem, we will use the many-to-one relationship model. The input are sequences of words, and the output is one single label, the accident severity.

Model Training

The first layer is the embedded layer that uses 100 length vectors to represent each word.
The next layer is the LSTM layer with 100 memory units.
The output layer creates 4 output values, one for each category.
Activation function is softmax for multi-class classification.
Because it is a multi-class classification problem, categorical_crossentropy is used as the loss function.
We use Adam optimization, which is an adaptive learning rate optimization algorithm that’s been designed specifically for training deep neural networks.
We also implement 2 regularization methods to prevent overfitting, which are dropout (at 20%) and early stopping.

Epoch 1/10
633/633 [==============================] - 1076s 2s/step - loss: 0.3712 - accuracy: 0.8764 - val_loss: 0.2969 - val_accuracy: 0.9033
Epoch 2/10
633/633 [==============================] - 1091s 2s/step - loss: 0.2867 - accuracy: 0.9077 - val_loss: 0.2827 - val_accuracy: 0.9062
Epoch 3/10
633/633 [==============================] - 1076s 2s/step - loss: 0.2769 - accuracy: 0.9095 - val_loss: 0.2809 - val_accuracy: 0.9062
Epoch 4/10
633/633 [==============================] - 1073s 2s/step - loss: 0.2726 - accuracy: 0.9101 - val_loss: 0.2785 - val_accuracy: 0.9062
Epoch 5/10
633/633 [==============================] - 1073s 2s/step - loss: 0.2688 - accuracy: 0.9105 - val_loss: 0.2765 - val_accuracy: 0.9073
Epoch 6/10
633/633 [==============================] - 1071s 2s/step - loss: 0.2655 - accuracy: 0.9105 - val_loss: 0.2757 - val_accuracy: 0.9076
Epoch 7/10
633/633 [==============================] - 1071s 2s/step - loss: 0.2635 - accuracy: 0.9112 - val_loss: 0.2767 - val_accuracy: 0.9071
Epoch 8/10
633/633 [==============================] - 1073s 2s/step - loss: 0.2626 - accuracy: 0.9118 - val_loss: 0.2693 - val_accuracy: 0.9087
Epoch 9/10
633/633 [==============================] - 1074s 2s/step - loss: 0.2597 - accuracy: 0.9115 - val_loss: 0.2698 - val_accuracy: 0.9089
Epoch 10/10
633/633 [==============================] - 1073s 2s/step - loss: 0.2584 - accuracy: 0.9125 - val_loss: 0.2716 - val_accuracy: 0.9076

Using just 50,000 sample records (over 3.5 millions available records), we can achieve an accuracy of 91.1% with recurrent neural nets. This already beats almost all of our classical ML models. The downside is that it takes a long time to train our LSTM model.