Observations of Random Initialization (Assignment)

shamus · July 10, 2021, 10:28am

Hi, I am having trouble understanding some of these observations for random initialization.

Can you explain what these means in a simpler way?

The cost starts very high. This is because with large random-valued weights, the last activation (sigmoid) outputs results that are very close to 0 or 1 for some examples, and when it gets that example wrong it incurs a very high loss for that example. Indeed, when log(𝑎[3])=log(0)log⁡(a[3])=log⁡(0), the loss goes to infinity.
Why is it that the last activation (sigmoid) outputs results that are very close to 0 or 1 for large random-valued weight?

Thank you.

nramon · July 16, 2021, 2:03pm

The input to the last sigmoid activation is calculated as z3 = np.dot(W3, a2) + b3. What values does a3 = sigmoid(z3) take when z3 is large (either positive or negative)?

(source)

Let me know if that was helpful

Topic		Replies	Views
Random initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	509	January 15, 2022
Week2 Programming Assignment 1 - random weight initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	525	October 20, 2022
Confusion about `Exercise 2 - initialize_parameters_random`: why small initiations cost bad performance Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	2	23	March 17, 2025
Course 1 Week 3 Neural Networks and Deep Learning coursera-platform	1	592	June 27, 2021
Week 1 increasing number of iterations for big randomly initialize value of W does not give better results Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	530	August 21, 2022

Observations of Random Initialization (Assignment)

Related topics