DLS, C1_W4. 'Parameters vs Hyperparameters' lecture

balaji.ambresh · March 8, 2024, 10:13am

Look at the compute_cost function in course 1 week 3 assignment 1 notebook and note that the cost is the average of logprobs. Hope this sheds light on the relationship between weight updates and size of mini-batches.

Course 2 covers details about mini-batch gradient descent and tips to pick mini-batch size.

Here’s a mention of gradient accumulation (this method isn’t covered in the specialization) that’ll come in handy when you have limited resources.

Topic		Replies	Views
Gradient steps in Mini batch vs batch Improving Deep Neural Networks: Hyperparameter tun	4	695	May 18, 2021
Mini Batch Gradient Descent vs Batch GD Improving Deep Neural Networks: Hyperparameter tun	2	559	May 23, 2021
Mini-batch understanding Improving Deep Neural Networks: Hyperparameter tun	8	617	March 7, 2023
Why is Batch Gradient Descent slower than Mini-batch Gradient Descent Improving Deep Neural Networks: Hyperparameter tun	1	539	November 27, 2022
Learning rate vs. mini-batch size in sum of losses Improving Deep Neural Networks: Hyperparameter tun	2	425	October 2, 2023

DLS, C1_W4. 'Parameters vs Hyperparameters' lecture

Related topics