I have a conceptual question - why do we add biases to the filters when performing a convolutional?
I understand that optimal filter values will be learned by the NN, but having a hard time figuring out why we add bias b. Thanks!
I have a conceptual question - why do we add biases to the filters when performing a convolutional?
I understand that optimal filter values will be learned by the NN, but having a hard time figuring out why we add bias b. Thanks!
NNâs (and other types of supervised learning) always add a âbiasâ value, so the system can (if needed) learn a constant value that is added to all examples.
Right! Itâs a lot more obvious why this is done if you start by considering the case of Logistic Regression or a feed forward network. There you are doing a general âaffine transformationâ. The simplest case of that is a line in the plane as in:
y = mx + b
In the multidimensional case:
Z = W \cdot A + b
If you omit the bias term, then you can only represent lines (or hyperplanes in the multidimensional case) that contain the origin. That is what mathematicians call âa significant loss of generalityâ.
So itâs the same argument here with convolutions even though the transformation is harder to visualize: if you omit the bias term, you are putting a significant constraint on the possible solutions you can find. Why limit your solutions in that way if you donât have to? Of course, itâs always possible that the best solution will end up having a zero bias, but why force that a priori? Just let Gradient Descent and back propagation learn the solution that works best for the particular problem.
Of course this is an Experimental Science: you can run the experiment yourself and check how often we end up with zeros as the bias values. Hold that thought as we go through the course and check it in some cases just to confirm the intuition here.
Thanks @paulinpaloalto and @TMosh - makes sense!