So I get that there is learning going on in Conv2D layers. For example,
tf.keras.layers.Conv2D(64, (3,3), activation='relu') has an activation function so it obviously changes the hyper params each iteration. But how exactly? Since this uses relu I would guess that any convolution that improves the accuracy keeps the same image filter matrix values, and convolutions that do not improve the accuracy gets their image filter matrix set to all zeros. This I’d guess which would set all pixels to black. I assume when this happens it would introduce sparsity into the convolutions generated, correct? So only non all black convolutions get used in deeper layers?
So how do convolutions work with other activation functions, that don’t just multiply the image filter matrix by a 1 or a zero?
Good one… hope I can give a good answer
A convolution is the simple application of a filter to an input. If you apply that filter to each of the pixels into your image you get an activation value. Systematic application of the same filter across the image gives as result a what’s called a feature map - that’s basically a transformation of your input into a reduced activation matrix. When you use them in CNN you are applying filters in parallel to a training dataset, in your particular case you are using 64 filters in your layer.
After applying those filters, you apply an activation function to include non-linearity to the mix. In your case you use ReLU so basically this function goes through your feature map - matrix - and set negative values to 0 and leaves positives untouched. It happens that Keras provide for convenience a parameter to do that, but it’s in fact a separated step.
Whatever, but how does learning take place? Filter’s values are fitted during training. Some of them start to specialise to detect specific types of features in the input - maybe vertical or horizontal lines or wheels, depends on the training dataset. A combination of those features are more suitable to identify an image categories as finding them into an image makes them good candidates to a category value: if it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. Those filters that fail to specialise tend to give bad results and values are fitted in next iterations.
Hope it helps
Thanks for the in depth overview