Module 1, L1: supervised learning and logistic regression
Module 1 is primarily about supervised learning, and we will generally assume that we have access to training data (x(1),y(1)),(x(2),y(2)),…,(x(m),y(m)) and that we would like to train a neural network to output
fθ(x) to either approximate
y (common in regression) or the distribution
p(y|x) (common in classification).
To select the parameters θ we generally try to minimize the average/empirical loss
1m∑mi=1L(fθ(x(i)),y(i)) . The details of the network and the loss varies, for instance, depending on if we are doing regression (
y is a vector in
Rny or classification (
y is class label, e.g., an integer). We will discuss this in more detail later, but in the next two videos we describe a first example: binary classification using logistic regression and the negative log likelihood loss (also known as the cross entropy loss).