Calculation the number of bias parameter

As I understand, bias term is added after all W[1]*A[l-1] is calculated (and sum up).
Therefore, I thought that each hidden layer, if we do NOT use convolutional layer, there is one bias term per layer. However, in exercise (quiz) in week 1, it says that “There should be one per neuron”.
(I think it is true for one bias per one filter…)

Could any one explain this? Thank you.


  1. l ↩︎

There’s a bias parameter for every activation unit. So if there are “N” units in a layer, there will be “N” bias parameters.

Thank you. Then I have misunderstood!

Hi TMosh, I share the same confusion and would like to confirm my understanding based on the conversation above:

` So for dense or fully connected network,
x[l] = a[l-1] * w[l] + b[l]
a[l] = g(x[l])
and b[l] ∈ ℝ
the bias term b[l] is indeed shared among all units/neurons in the layer. This means that b[l] is indeed a scalar value (b[l] ∈ ℝ) shared among all neurons in the layer

while for a fully connected layer in a CNN:
x[l] = a[l-1] * wl + bl
a[l] = g(x[l])
and bl ∈ ℝ
in 1 Fully connected layer with m neurons, the i-th neuron in the layer has its own bl parameter, and they are not shared among neurons.`

Thank you for your time

For both NN’s and CNN’s, every unit has its own bias value.