Supervised Learning
Machine Learning Basics
There are suggestions on some suitable material below (but you will also easily find other good material by Google/Youtube/...).
-
What is supervised, unsupervised and reinforcement learning?
- The bias-variance trade off.
- Overfitting and what influences it; how regularization can help.
- Why does one split data into three sets: Training (build model from), validation (check model after training), test data (used only once in a while, not to influence the training too much)
- Validation by leave-one-out, K-fold cross validation
- One-hot encoding
- Cost functions, mean squared error, cross-entropy
Methods
Also make sure you have some knowledge of the following supervised learning methods. You will not have time to study all in detail, but you should understand basic assumptions and ideas:
- Linear regression
- Logistic regression
- K-nearest regression
- Linear and Quadratic Discriminant Analysis (LDA, QDA)
- Decision trees and random forests
- Support Vector Machines
- Boosting
- Neural Networks
Tools
We will use python and the scikit-learn (sklearn) toolbox. We suggest you spend some time (1-2h) viewing the nice collection of programming examples on the scikit-learn home page Links to an external site.
If you understand most of what happens in the following two Jupyter notebooks you are ready to move on to the examination on music classification. Download these notebooks and data to your VM and run them . (Don't forget to activate the environment where the tools where installed, e.g. by the command source ~/tensorflowenv/bin/activate)
- Leaf Classfication
- Jupyter notebook illustrating 10 classification methods Links to an external site.
- Leaf data download (you don't need the images, only train.csv and test.csv) Links to an external site.
- Extra challenge: Can you improve accuracy by using any of the comments given to the notebook on Kaggle?
- Titanic Survival Prediction
Machine Learning Basics
A basic and freely available book that provides a gentle introduction to some of the concepts is https://mml-book.github.io/ Links to an external site..
Choose the style of material and ambition level that suits you.
A friendly introduction to Machine Learning
Links to an external site. [31 min]
Machine Learning Recipes (#1 in playlist of 10)
Links to an external site.
Methods
Linear regression using scikit-learn [9min]
Links to an external site.
Logistic regression [11min]
Links to an external site.(Don't spend time downloading and running the code. You will have to work somewhat to get it to going.)
For more detail, Chapter 4 in the book Introduction to Statistical Learning Links to an external site. covers classification using logistic regression, linear discriminant analysis, quadratic discriminant analysis, K-nearest Neighbors (code are provided in the R language).
Decision Trees, Bagging and Random Forests Links to an external site.
Chapter 8.1-8.2 in the book Introduction to Statistical Learning Links to an external site. covers Decision Trees, Bagging and Random Forests in more detail.
Boosting tutorial Links to an external site.
Chapter 8.2.3 in the book Introduction to Statistical Learning Links to an external site. describes Boosting somewhat further. Also see the wiki page on Boosting. Links to an external site.
Support Vector Machines [7min]
Links to an external site.
Chapter 9 in the book Introduction to Statistical Learning Links to an external site. covers Support Vector Machines in more detail. Some more about the kernel trick can be found here Links to an external site..
What are Neural Networks, part 1
Links to an external site.
(you might want to watch parts 2-4 also)
To get some intuition to NN architectures, learning algorithms, and training parameters, spend some time (but not too much...) trying out the Neural Network Playground Links to an external site.