I’ve found what appear to be four omissions, inconsistencies or errors in the C4W4 “What are deep ConvNets learning?” lecture video in the (Neural Style Transfer section) slide 2. However, it’s rare to have four such issues on one slide so I very much welcome a sanity check. Can anyone help? Thanks!
“Units” of the ConvNet layer 1 are referred to a few times here on slide 2 and it is essential to understand what they are to comprehend this slide. However, I do not recall them being defined in any of the DLS courses. A large Stack Exchange thread quoting this course has emerged (neural networks - definition of "hidden unit" in a ConvNet - Cross Validated). I believe the most concise definition is that a “unit” of layer 1 of this ConvNet is a single filter of dimension 115x115x3, of which there are 96 in layer 1 here. Is this correct?
Number of patches visualized. Ng chooses to plot exactly nine image patches, but it is not explained where nine comes from and numbers matter a lot in this course, especially when they are visualized. Is nine an arbitrary choice? If not, why not?
“Seeing” the network. Ng says that “a hidden unit in layer 1, will see only a relatively small portion of the neural network.” But a hidden unit in layer 1 doesn’t see any of the network. It sees input images on which it operates, one position at a time. Correct?
Image patch size. Ng also says “And so if you visualize, if you plot what activated unit’s activation, it makes makes sense to plot just a small image patches, because all of the image that that particular unit sees.” This makes intuitive sense. However, in this particular example, backsolving for the size of the layer 1 filters suggests that each of the 96 filters is 115x115x3 which is over half the input image size. This is not “small.” Further, each of the nine example patches shown seems to be of significantly lower resolution than 115x115. I’m guessing there is an error here where the layer 1 filters are intended to be of small planar dimensions (say 17x17) but accidentally were made 115x115 which is over half the input image dimension. Is that right?
Number of units visualized. After showing 9 image patches for a single hidden unit, Ng repeats this exercise for an additional 8 units for a total of 9 units. However, there are 96 units, not 9, right? In this case, is the choice to visualize exactly 9 units also an arbitrary choice? If not, why not? Further is the choice of the number 9 for both image patches and units (patches and units are different) a coincidence? (Assuming so but want to sanity check.)