Supervised Learning
Machine Learning Basics
You should before moving on to the examination know about the following basics of ML. There are suggestions on some suitable material below (but you will also easily find other good material by Google/Youtube/...)
-
What is supervised, unsupervised and reinforcement learning?
- The bias-variance trade off.
- Overfitting and what influences it; how regularization can help.
- Why does one split data into three sets: Training (build model from), validation (check model after training), test data (used only once in a while, not to influence the training too much)
-
Validation by leave-one-out, K-fold cross validation
- One-hot encoding
- Cost functions, mean squared error, cross-entropy
Methods
Also make sure you have some knowledge of the following supervised learning methods. You will not have time to study all in detail, but you should understand basic assumptions and ideas:
- Linear regression
- Logistic regression
- K-nearest regression
- Linear and Quadratic Discriminant Analysis (LDA, QDA)
- Decision trees and random forests
- Support Vector Machines
- Boosting
- Neural Networks (shallow, we will cover Deep networks later)
Tools
We will use python and the scikit-learn (sklearn) toolbox. We suggest you spend some time (1-2h) viewing the nice collection of programming examples on the scikit-learn home page Links to an external site.
If you understand most of what happens in the following two Jupyter notebooks you are ready to move on to the examination on music classification. Download these notebooks and data to your VM and run them . (Don't forget to activate the environment where the tools where installed, e.g. by the command source ~/tensorflowenv/bin/activate)
- Leaf Classfication
- Jupyter notebook illustrating 10 classification methods Links to an external site.
- Leaf data download (you don't need the images, only train.csv and test.csv) Links to an external site.
- Extra challenge: Can you improve accuracy by using any of the comments given to the notebook on Kaggle?
- Titanic Survival Prediction
Machine Learning Basics
Choose the style of material and ambition level that suits you.
A friendly introduction to Machine Learning
Links to an external site. [31 min]
Machine Learning Recipes (#1 in playlist of 10)
Links to an external site.
Methods
Linear regression using scikit-learn [9min]
Links to an external site.
Logistic regression [11min]
Links to an external site.(Don't spend time downloading and running the code. You will have to work somewhat to get it to going.)
For more detail, Chapter 4 in the book Introduction to Statistical Learning Links to an external site. covers classification using logistic regression, linear discriminant analysis, quadratic discriminant analysis, K-nearest Neighbors (code are provided in the R language).
Decision Trees, Bagging and Random Forests Links to an external site.
Chapter 8.1-8.2 in the book Introduction to Statistical Learning Links to an external site. covers Decision Trees, Bagging and Random Forests in more detail.
Boosting tutorial Links to an external site.
Chapter 8.2.3 in the book Introduction to Statistical Learning Links to an external site. describes Boosting somewhat further. Also see the wiki page on Boosting. Links to an external site.
Support Vector Machines [7min]
Links to an external site.
Chapter 9 in the book Introduction to Statistical Learning Links to an external site. covers Support Vector Machines in more detail. Some more about the kernel trick can be found here Links to an external site..
What are Neural Networks, part 1
Links to an external site.
(you might want to watch parts 2-4 also)
To get some intuition to NN architectures, learning algorithms, and training parameters, spend some time (but not too much...) trying out the Neural Network Playground Links to an external site.
Additional material for the ambitious
These books contain detailed material on all topics above.
- Hands-on Machine Learning with scikit-learn and tensorflow Links to an external site. Links to an external site.
- Introduction to Statistical Learning (free pdf at book home page) Links to an external site.
More links are given on this page with Further Resources