Shape of bias of b and db

Rupesh · May 19, 2021, 3:53pm

As in the video it is told that for m examples the shape of bias is nl* 1 but because of broadcasting it becomes nl* m
since shape of b and db is same, but when we get the shape of db it is nl* m and then average is taken and is converted to nl* 1 . But in the case of b we are copying the value of bias for every example by broadcasting but in db we are taking average ?

yanivh · May 19, 2021, 5:03pm

Hi @Rupesh. Thank you for this great question. b is a vector of biases is a layer. You can also consider it as another column in W that multiplies an additional element is the vector x with a value of 1. db (and dW) are the updates to the bias vector and weight matrix. For a batch of samples we have one update, which is the average of updates computed for each b and W in the batch. This is why db and dW are reduced in the direction of samples. For a calculated db you add its value to every b in the batch (same for W).
I hope my explanation helped you understand. Please comment with more questions if something is not clear

Rupesh · May 21, 2021, 5:02pm

I got that when we are passing forward propagation we are simply broadcasting it into m column vectors. That means every column is same but when we are back propagating instead of taking a single column we are taking the average just because for every example the derivative of loss function for that example and the bias would be different. Am i right?

yanivh · May 22, 2021, 4:35am

I am not sure I understood your description of the process. With Wx+b you get a tensor of size (m, h, w, c). Since b is a vector, internally it is duplicated to match the size of Wx in the summation.

When calculating gradients for back propagation you reduce by averaging the direction of samples. The reason is that you want a single update for a batch of samples. This allows greater smoothing by increasing batch size, and vice versa when you decrease it. I am not certain this was what you meant in your description, but please correct me if I wrong

Rupesh · May 24, 2021, 5:17pm

Thank you I got that

Topic		Replies	Views
Trouble understanding b vector back propagation Neural Networks and Deep Learning week-module-3 , coursera-platform	1	345	January 6, 2024
Dividing by "m" in back propagation using vectorized implementation Neural Networks and Deep Learning week-module-3 , coursera-platform	3	506	February 19, 2024
Why use `average` when vectorizing the backpropagation calculations(C1_W4, page17) Neural Networks and Deep Learning coursera-platform	3	382	August 17, 2023
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning coursera-platform	1	511	July 18, 2022
The confusion in Backpropagation Intuition vedio(Neural Networks and Deep Learning) Neural Networks and Deep Learning week-module-3 , ai-discussions	3	45	August 16, 2025

Shape of bias of b and db

Related topics