DLS, C1_W4. 'Parameters vs Hyperparameters' lecture

Look at the compute_cost function in course 1 week 3 assignment 1 notebook and note that the cost is the average of logprobs. Hope this sheds light on the relationship between weight updates and size of mini-batches.

Course 2 covers details about mini-batch gradient descent and tips to pick mini-batch size.

Here’s a mention of gradient accumulation (this method isn’t covered in the specialization) that’ll come in handy when you have limited resources.

2 Likes