Shouldn’t all filter dimensions be the same as the channel numbers?
Why would you think that?
I think there is a fundamental confusion here. The f values have literally nothing to do with the nC values. The f values are the “filter size” in the h and w dimensions. So if f is 3 that means the filter that you are stepping over the image with has height and width of 3 x 3. The number of channels on the input size must be matched by each filter. So in the first layer if you are processing RGB images, then the input images will have 3 color channels. So each filter in the first layer will have shape 3 x 3 x 3. The numbers happen to be the same in that case, but that is a coincidence.
Then what Prof Ng is showing there is that there are a total of 10 such filters in the first layer, so you get 10 output channels. So the shape of the W for the first layer is 3 x 3 x 3 x 10.
Now consider the second layer. The inputs are no longer images, but have 10 channels. And he chooses a filter size of 5 x 5 for the second layer in terms of height and width. Then they need to match the channels on the input, so each filter will be 5 x 5 x 10. And he specifies that there are 20 of them. So the output of layer 2 will have 20 channels.
And so forth …
If the above is still not clear to you, my recommendation would be to watch the lectures again. Prof Ng explains all this quite clearly.