Module 1, L1: supervised learning and logistic regression

Module 1 is primarily about supervised learning, and we will generally assume that we have access to training data LaTeX:  (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}), \dots, (x^{(m)},y^{(m)}) (x(1),y(1)),(x(2),y(2)),,(x(m),y(m)) and that we would like to train a neural network to output LaTeX: f_{\theta}(x)fθ(x)  to either approximate LaTeX: yy (common in regression) or the distribution LaTeX: p(y|x)p(y|x) (common in classification). 

To select the parameters LaTeX: \thetaθ  we generally try to minimize the average/empirical loss LaTeX: \frac{1}{m} \sum_{i=1}^m L(f_{\theta}(x^{(i)}),y^{(i)})1mmi=1L(fθ(x(i)),y(i))  . The details of the network and the loss varies, for instance, depending on if we are doing regression (LaTeX: yy  is a vector in LaTeX: \mathbb{R}^{n_y}Rny  or classification (LaTeX: yy  is class label, e.g., an integer). We will discuss this in more detail later, but in the next two videos we describe a first example: binary classification using logistic regression and the negative log likelihood loss (also known as the cross entropy loss).