Convolutions over volumes (part 1)
In the following video, Andrew Ng will explain one aspect which is easy to miss when you learn about CNNs the first time:
- Question: When the input to a convolution has multiple channels, how do filters handle that input? As an example, the input to the first layer often has three channels corresponding to red, blue and green.
- Answer (spoiler): Each filter operates on all (!) input channels.
Two details are misleading about Andrew's video:
- He refers to 2D convolutions as convolutions over volumes, simply because the input has multiple channels (the input is therefore three dimensional). Note that we sometimes also perform 3D convolutions (e.g., if we have a pointcloud or voxels in 3D), and it is more common to refer to that as a convolution over a volume; if the input to a 3D convolution has multiple channels it would actually be a 4D tensor.
- Apart from the weights, each convolution also has a bias. The number of parameters in a filter of size 3 x 3 x 3 is therefore 27+1=28, where the first 27 are weights and the last parameter is a bias.