Deep Learning for Natural Language Processing

Due Nov 6, 2020 by 11:59pm
Points 1
Submitting a file upload
Available after Sep 24, 2020 at 9:55am

This is one of two assignments on deep learning. You only have to complete one of the two assignments. The other assignment is Food Recognition Task

This assignment should be handed in individually, but you are allowed to discuss with others.

In this assignment you will investigate the use of neural networks for natural language processing (NLP). You will work on the problem of sentiment analysis and you will work with the dataset prepared in Learning Word Vectors for Sentiment Analysis (Links to an external site.) by Mass et al (you dont need to read this reference). The data is mined from the IMDB movie database and the task is to determine if a review is positive or negative based on the review text.

Read these movie reviews and determine how you would classify them:

"Uhhh ... so, did they even have writers for this? Maybe I'm picky, but I like a little dialog with my movies. And, as far as slasher films go, just a sliver of character development will suffice."
"Liked Stanley & Iris very much. Acting was very good. Story had a unique and interesting arrangement. The absence of violence and sex was refreshing. Characters were very convincing and felt like you could understand their feelings. Very enjoyable movie."
"Everything that made the original so much fun seems to absent here. This is simply a "run of the mill demons on the loose wrecking havoc" slasher, but without the passion that graced the original"
OK, so the musical pieces were poorly written and generally poorly sung (though Walken and Marner, particularly Walken, sounded pretty good). And so they shattered the fourth wall at the end by having the king and his nobles sing about the "battle" with the ogre, and praise the efforts of Puss in Boots when they by rights shouldn't have even known about it.Who cares? It's Christopher Freakin' Walken, doing a movie based on a fairy tale, and he sings and dances. His acting style fits the role very well as the devious, mischievous Puss who seems to get his master into deeper and deeper trouble but in fact has a plan he's thought about seven or eight moves in advance. And if you've ever seen Walken in any of his villainous roles, you *know* the ogre bit the dust HARD at the end when Walken got him into his trap. A fun film, and a must-see for anyone who enjoys the unique style of Christopher Walken.

According to the IMDB movie review database (Links to an external site.), these are classified as negative, positive, negative and positive respectively. Some of them you probably find easy to classify, filled with positively or negatively valued words, some however require a deeper understanding of the English language.

The task is to use deep learning to build such a classifier. The IMDB movie data base is nowadays considered "too small" to reach the full potential of deep networks (there are only 50.000 reviews in the dataset). The advantage is that the training can be performed using Google colab or on your laptop.

1 Improving the baseline

You should first investigate the provided baseline code, given in the notebook IMDBbaseline_convnet.ipynb Links to an external site. and make sure that you understand what is going on. Ensure that you can discuss and reason around

The role of training and validation data
Overfitting and undercutting and how you can see this during training
The different layer and the number of parameters for these (make sure you understand the model summary table)

Perform experiments to investigate the performance of the network when varying

max_review_length and n_unique_words
batch size
other hyperparameters
changed network setup, i.e. adding layers or changing sizes
any other things that you find interesting

Report the best performance (validation accuracy) you can reached in this google sheet.

FYI: The best groups in 2018 reached around 90%.

2 Try some other architecture

Choose a completly different network architecture, and investigate if you can improve the performance even further. Alternatives are e.g.

RNN
LSTM
Transformers see e.g. https://kgptalkie.com/sentiment-classification-using-bert/ Links to an external site.

A remark: Since we are using the validation set repeatedly to evaluate performance over many trained networks, there is a severe risk of overfitting towards this set, and giving an optimistic estimate of the performance of the finally chosen network. One should really set aside a third data set that is only used once, in a final evaluation of performance. But we will not do this here.

The handin:

Handin your code and a short presentation, (say 5-15 slides), describing your results and what performance you estimate you have achieved. Hand in using Canvas before the deadline.

Rubric

Title:

Find a Rubric

Title

Title
Criteria	Ratings	Pts
Description of criterion threshold: 5 pts Edit criterion description Delete criterion row	5 to >0 pts Full Marks blank 0 to >0 pts No Marks blank_2 This area will be used by the assessor to leave comments related to this criterion.	pts / 5 pts --
Description of criterion threshold: 5 pts Edit criterion description Delete criterion row	5 to >0 pts Full Marks blank 0 to >0 pts No Marks blank_2 This area will be used by the assessor to leave comments related to this criterion.	pts / 5 pts --