At 5:18 it is said that the dw vector is an n x 1 vector but it comes out to be a singular value. Also, that does seem like a mistake since dw needs to different for each w. Please do clarify the same. Thank you so much

The bottom line for the dXXs is that they have to have the same shape as their corresponding variable.

dw need to have the same shape as w

db will have the same shape as b

If you want to introduce dys for some y value you would have introduced for some reason in one of your own projects, it would have the same shape as y.

The reason is its interpretation / function : the dXX is a (little) variation to apply position-wise to the XX corresponding variable, thus XX and dXX must agree in shape for XX + dXX to be defined (or rather XX - alpha dXX for gradient descent)

Hope that answer your question

Thank you for your reply!

My issue was that exactly but dw doesnt seem to have the same shape, going by the calculations shown in the video

I suggest you recheck the slides. dw and w are of shape (n, 1). I believe you may be confusing yourself between w and w tranpose. w transpose is of dimension (1, n).

Hope this helps.