Module 1, L5: remarks

Lectures 5 and 6 introduce convolutional neural networks (CNNs) and describe how they can be used to solve a range of different tasks related to computer vision. More specifically, in lecture 5 we 

  • describe the main components in CNNs (convolutions with stride, pooling, etc),
  • focus on image classification,
  • cover several of the main CNN architectures (LeNet5, AlexNet, VGG, GoogLeNet and ResNet).

One reason for going through all these architectures, is that it may provide an understanding for how network architectures may be adjusted to improve performance, which we believe is useful also in other applications of deep learning. 

Most of the videos for L5 and L6 are taken from a course at Stanford: http://cs231n.stanford.edu Links to an external site.. Most of the slides from lecture 5 and lecture 6 are available here: Download L5.pdf

and Download L6.pdf. For some of the videos that are not from Stanford (or by Lennart) you can find the slides here: https://e2eml.school/how_convolutional_neural_networks_work.html. Links to an external site.See Reading directions and slides from videos for more advices on literature; on that page, we also discuss transformers which has recently (taking off in 2020) become a useful alternative to CNNs for computer vision. 

Links to an external site.