Look at the compute_cost
function in course 1 week 3 assignment 1 notebook and note that the cost is the average of logprobs
. Hope this sheds light on the relationship between weight updates and size of mini-batches.
Course 2 covers details about mini-batch gradient descent and tips to pick mini-batch size.
Here’s a mention of gradient accumulation (this method isn’t covered in the specialization) that’ll come in handy when you have limited resources.