Confusion about `Exercise 2 - initialize_parameters_random`: why small initiations cost bad performance

jiangxianfu · March 17, 2025, 8:39am

When i doing the Exercise 2 - initialize_parameters_random in DLS C2W1 Initialization, it requires to initialize the weights to large random values (scaled by *10) and your biases to zero. And the result of it is:

On the train set:
Accuracy: 0.83
On the test set:
Accuracy: 0.86

But when I trying a small random value(scaled by *0.01)the result turn to bad:

On the train set:
Accuracy: 0.4633333333333333
On the test set:
Accuracy: 0.48

To figure out why it works like that, i tried scaled by 1 and 0.1.
only scaled by 1 and upper make right classification.

Why?

I remeber in previous practise the wight should initialization with np.random.randn(shape) * 0.01
Is there someone help me figure out the reason? I will appreciate very much.

gent.spah · March 17, 2025, 1:28pm

In this lab, they are trying to see the difference between different types of weight initializations. Now, the weights every pass of the training they change right, with enough iterations and hopefully a good local optima is achieved.

But if the start of the weights is bad it will take many iterations and hopefully the process doesnt get stuck on a “not so good” local optima. Therefore they are suggesting weights of 10 times the normal distribution, probably these weights converge faster to a good optima that is why!

dtonhofer · March 17, 2025, 8:12pm

That graph of the cost displayed is misleading I think.

PyPlot automatically rescales the plot vertically to show movement, but y values (the cost) is probably barely moving at all. The value on the top left of the plot is the scale I think.

Try this plotter:

def plot2(costs, learning_rate):
    plt.plot(costs, marker='o', linestyle='-')
    plt.ylim(min(costs) - 1e-10, max(costs) + 1e-10)
    plt.ticklabel_format(style='plain', axis='y')  # Avoid scientific notation
    plt.ylabel("cost")
    plt.xlabel('iterations (per hundreds)')
    plt.title("Learning rate =" + str(learning_rate))
    ax = plt.gca()
    ax.yaxis.set_major_formatter(ScalarFormatter(useOffset=False))
    plt.grid(True)
    plt.show()

Note that for small weights, the gradients will be small as they will be elementwise multiplied by the uniformly small Z’s, several times in case of a deep NN (that should be “the problem of vanishing gradients”). The A’s will be ~1 (or either 0 or 1 in case of ReLU) as the Z’s will be near 0 but that won’t change anything. Nothing will move particularly fast. You will have to crank up the number of iterations or the learning rate.

Topic		Replies	Views
Problem with the Initialization Assignment in C2 W1 Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	5	234	February 12, 2024
W4-Assignment 1 , Exercise 1 Neural Networks and Deep Learning week-module-4 , coursera-platform	2	21	January 25, 2025
Question about DLS Course 2 weekly quiz Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	550	March 24, 2022
Week 3 Programming Assignment Exercise 3 Error Neural Networks and Deep Learning coursera-platform	5	796	October 11, 2021
Can the random initialization of weights return very small values using np.random.randn((x,y))*0.001? Neural Networks and Deep Learning coursera-platform	3	692	September 28, 2021

Confusion about `Exercise 2 - initialize_parameters_random`: why small initiations cost bad performance

Related topics