# Question on when initializing the parameters

W = np.random.randn(n_y, n_x) * 0.01
b = np.zeros((n_y, 1))

why n_y comes before n_x and also n_y, 1 is there a reason.

1 Like

The generation of weight matrix, W, reflects the network structure. So what we have here is that n_x is the number of units in the input layer, X; n_y is the number of units in the output layer. The network diagram should indicate the arrangement.

n_y comes before n_x because the network is moving in the direction from input to output. When initialise b, the bias vector, np.zeros() is called with the shape of the array as argument, where the 1 means it is a column vector.

1 Like

thanks for the clarification.

1 Like

Hi @Kic
sorry, why 0.01 is there a reason, because i thing the instructor donâ€™t mention during the lecture.

Multiplying the output from random.randn() by 0.01 would scale down the values. It doesnâ€™t change the essence of data.

2 Likes

There is some theoretical basis for selecting the range of the random initial values. Itâ€™s complicated, as it depends on the size of the NN, the number of layers, the numbers of units, etc. Itâ€™s an area of some research.

In practice you just try values between 0 and +1, or -1 and +1, and see how it goes. You may need to adjust the multiplier depending on your specific model.

1 Like