As I understand, bias term is added after all W[1]*A[l-1] is calculated (and sum up).
Therefore, I thought that each hidden layer, if we do NOT use convolutional layer, there is one bias term per layer. However, in exercise (quiz) in week 1, it says that “There should be one per neuron”.
(I think it is true for one bias per one filter…)
Hi TMosh, I share the same confusion and would like to confirm my understanding based on the conversation above:
` So for dense or fully connected network,
x[l] = a[l-1] * w[l] + b[l]
a[l] = g(x[l])
and b[l] ∈ ℝ
the bias term b[l] is indeed shared among all units/neurons in the layer. This means that b[l] is indeed a scalar value (b[l] ∈ ℝ) shared among all neurons in the layer
while for a fully connected layer in a CNN:
x[l] = a[l-1] * wl + bl
a[l] = g(x[l])
and bl ∈ ℝ
in 1 Fully connected layer with m neurons, the i-th neuron in the layer has its own bl parameter, and they are not shared among neurons.`