The picture on the left shows one vertical edge in the middle. How come after the convolution, the picture on the right hand side shows two vertical edges? I mean, I understand the culculation, but shouln’t the convolution reflect the same number and position of the edges?
Interesting point! I’m just a fellow student and don’t claim to know the definitive answer, but here are some thoughts your question triggers for me:
That’s just a question of how you interpret the output, right? The background zeros mean “nothing to see here” and the 30 values mean “there’s an edge here”. And the actual edge in the original image could be viewed as taking place in two adjacent vertical columns of pixels: the start and end of it. It takes both columns and the edge is defined by the change between them. Viewing either one in isolation tells you nothing.
This is just one step in the process: in a real ConvNet, there are multiple layers of filters which detect increasingly complex objects. And the normal way things work in a ConvNet is that the height and width dimensions reduce as you go through the layers and the number of output channels increases. Meaning that you can think of it as losing the spatial resolution in exchange for increasingly sophisticated object recognition. But I guess this makes your point even stronger: if the spatial resolution is reduced, then two columns takes even more of the available space. Hmmmm.
Try some experiments with the contents of the filter and see if you can generate one column of non-zero values instead of the two that Prof Ng’s filter produces. Maybe you’re on to something here and the filter could be improved. But then there’s the real overall point in item 4):
Note that what Prof Ng is doing here is just giving you an artificial example to motivate how convolutional filters can detect things. Maybe people used hand-coded filters like this in the olden days (i.e. in the 1990’s and earlier), but that’s not how things work anymore. Now we just initialize the filters randomly and run back propagation and let it learn what it learns. Or to put it another way, this either works or it doesn’t. The evidence is that it does, provided that you choose your network architecture wisely. Stay tuned to see how convolutional filters are really applied and what the results are.
Thank you so much Paul for your detailed and informative answer. I will read it a few more times. Sure, I will stay tuned.
The input and the filter are both symmetric.
Since the filter output is 4x4, it has created a vertical structure that is centered in the middle.
To do otherwise would create an illogical asymmetric output.
@TMosh I guess it’s just my poor understanding. But why this centering to the middle? After all, the picture has its bright contents on the left, not in the middle. Why would it be an illogical output to reflect this in the output of the convolution?
I think Tom’s point is that the goal of the filter is not highlighting the “bright content”: the intent is to highlight the “feature” that it’s looking for, which is the edge. And the position of the edge is symmetric w.r.t. the image.
ahaaa! now I get it!