Question about DLS Course 2 weekly quiz

Hi, I met some problems when doing the quizzes. Can someone help me?

The first question is about the techniques to help find parameter values that attain a small value of cost function J (considering it takes excessively long to find the parameters). I don’t think 1. better random initialization for the weights and 2. normalising the input data can achieve the requirement.

The second question is about gamma and beta in Batch Norm. I think they are related to all the hidden units in that layer. So why ‘there is one global value gamma and one global value beta for each layer, and applies to all hidden units in that layer’ wrong? And why can’t we tune them by random sampling?

Thank you for your reply.
Zhijie

Better random initialization and input normalization are important techniques for improving results.

The point is that gamma and beta are not “global values”: they are vectors for starters and they are different per layer at which batch norm is applied. They aren’t tuned by random sampling: they are tuned by computing the mean and variance of the actual input values.

Hello, Could you please elaborate why the quiz question emphasizes “a small value of cost function”? I understand optimization methods like Adam and better initialization will help speed up gradient descent but why is a small cost function relevant to the premise of the problem? Thanks!

The whole point of gradient descent is to reduce the value of the cost, right? A small cost is better than a larger cost.

got it. Thanks!

---- 回复的原邮件 ----

发件人 | Paul Mielke via DeepLearning.AIdlai@discoursemail.com |

  • | - |
    日期 | 2022年03月24日 12:10 |
    收件人 | shangranli@126.comshangranli@126.com |
    抄送至 | |
    主题 | [DeepLearning.AI] [Deep Learning Specialization/DLS Course 2] Question about DLS Course 2 weekly quiz |

| paulinpaloalto Super Mentor - DLS
March 24 |

  • | - |

The whole point of gradient descent is to reduce the value of the cost, right? A small cost is better than a larger cost.