C4_W2_Video_Classic_Networks: Confusion about last 2 FC layers in LeNet-5?

For the LeNet 5 architecture in the video Classic Networks, I was a bit confused by the last 2 FC layers:

After consulting some other sources including wikipedia, at first I think at the position of first FC layer in the Video, this should be a convolution layer instead.

[Update]
But looked at the wikipedia page again, I thought that because using a convolution layer with a 5x5 filter (the filter size is the same as the previous feature map’s height x width) will result in an operation that has a pretty much the same effect as using a fully connected layer. So that’s why prof. Andrew just put FC instead of Conv. layer?

This also assumes we don’t count flatten as a layer based on prof. Andrew’s suggestion.

Need your help to verify this.

What about this post (and there are many similar if you search even in the TF-AT part of the forum):

Yes, that is correct.