Bias/variance: remarks (part 1)

In these videos, the terms bias and variance are used in a relaxed sense

bias $LaTeX: \approx$ $\approx$ performance on training data compared to optimal performance,
variance $LaTeX: \approx$ $\approx$ difference between loss on training and validation data,

and for general problems, not only regression problems. The purpose with introducing these concepts is to help you reason about how to adjust your neural network architectures.

People with a background in statistics, may recall that bias and variance have specific technical definitions. Those definitions are not used in these videos but repeated below for completeness:

In regression, if $LaTeX: \mu_y = \mathbb{E} \left[ \hat{y}(X) \Big| Y=y \right]$ $\mu_y = \mathbb{E} \left[ \hat{y}(X) \Big| Y=y \right]$ then

$LaTeX: \text{Bias}(y) =\mu_y - y$ $\text{Bias}(y) =\mu_y - y$
$LaTeX: \text{Variance} (y) = \mathbb{E} \left[ ( \hat{y}(X)- \mu_y)^2 \Big| Y=y \right]$ $\text{Variance} (y) = \mathbb{E} \left[ ( \hat{y}(X)- \mu_y)^2 \Big| Y=y \right]$

Roughly speaking, bias represents consistent errors (for a given label), whereas variance represents errors due to variations in $LaTeX: \hat{y}(X)$ $\hat{y}(X)$ , which is at least vaguely related to how the terms are used in the upcoming videos.