It is correct that number of filters is the number of output channels. That’s the point to keep clear: there are two sets of channels input and output. The shape of each filter matches the number of input channels and the total number of filters determines the number of output channels in a convolution layer.
Yes, that sounds right. The number of input channels is typically only 3 for the very first conv layer, if the input is an image with 3 colors. Most of the layers after that will have more input channels.
Notice that the notation in that section of the lecture uses n_c^{[l-1]} as the number of input channels, that is to say the number of channels output by the previous layer l - 1. Then the number of output channels is n_c^{[l]} for layer l.
So the number of filters in layer l is n_c^{[l]} which then becomes the number of output channels. Each one has n_c^{[l-1]} as its third dimension, the number of input channels.