Supervised Learning

Machine Learning Basics

There are suggestions on some suitable material below (but you will also easily find other good material by Google/Youtube/...).

What is supervised, unsupervised and reinforcement learning?
The bias-variance trade off.
Overfitting and what influences it; how regularization can help.
Why does one split data into three sets: Training (build model from), validation (check model after training), test data (used only once in a while, not to influence the training too much)
Validation by leave-one-out, K-fold cross validation
One-hot encoding
Cost functions, mean squared error, cross-entropy

Methods

Also make sure you have some knowledge of the following supervised learning methods. You will not have time to study all in detail, but you should understand basic assumptions and ideas:

Linear regression
Logistic regression
K-nearest regression
Linear and Quadratic Discriminant Analysis (LDA, QDA)
Decision trees and random forests
Support Vector Machines
Boosting
Neural Networks

Tools

We will use python and the scikit-learn (sklearn) toolbox. We suggest you spend some time (1-2h) viewing the nice collection of programming examples on the scikit-learn home page Links to an external site.

If you understand most of what happens in the following two Jupyter notebooks you are ready to move on to the examination on music classification. Download these notebooks and data to your VM and run them . (Don't forget to activate the environment where the tools where installed, e.g. by the command source ~/tensorflowenv/bin/activate)

Leaf Classfication
- Jupyter notebook illustrating 10 classification methods Links to an external site.
- Leaf data download (you don't need the images, only train.csv and test.csv) Links to an external site.
- Extra challenge: Can you improve accuracy by using any of the comments given to the notebook on Kaggle?
Titanic Survival Prediction
- Jupyter notebook illustrating classfication, data handling and plotting Links to an external site.
- Titanic data download (train.csv Links to an external site. and test.csv Links to an external site.)

Machine Learning Basics

A basic and freely available book that provides a gentle introduction to some of the concepts is https://mml-book.github.io/ Links to an external site..

Choose the style of material and ambition level that suits you.

A friendly introduction to Machine Learning Links to an external site. [31 min]

Machine Learning Recipes (#1 in playlist of 10) Links to an external site.

Methods

Linear regression using scikit-learn [9min] Links to an external site.

Logistic regression [11min] Links to an external site.(Don't spend time downloading and running the code. You will have to work somewhat to get it to going.)

For more detail, Chapter 4 in the book Introduction to Statistical Learning Links to an external site. covers classification using logistic regression, linear discriminant analysis, quadratic discriminant analysis, K-nearest Neighbors (code are provided in the R language).

Decision Trees, Bagging and Random Forests Links to an external site.

Chapter 8.1-8.2 in the book Introduction to Statistical Learning Links to an external site. covers Decision Trees, Bagging and Random Forests in more detail.

Boosting tutorial Links to an external site.

Chapter 8.2.3 in the book Introduction to Statistical Learning Links to an external site. describes Boosting somewhat further. Also see the wiki page on Boosting. Links to an external site.

Support Vector Machines [7min] Links to an external site.

Chapter 9 in the book Introduction to Statistical Learning Links to an external site. covers Support Vector Machines in more detail. Some more about the kernel trick can be found here Links to an external site..

What are Neural Networks, part 1 Links to an external site.

(you might want to watch parts 2-4 also)

To get some intuition to NN architectures, learning algorithms, and training parameters, spend some time (but not too much...) trying out the Neural Network Playground Links to an external site.