Question on when initializing the parameters

W = np.random.randn(n_y, n_x) * 0.01
b = np.zeros((n_y, 1))

why n_y comes before n_x and also n_y, 1 is there a reason.

1 Like

Hi @Mohammad_Omar_Adde ,

The generation of weight matrix, W, reflects the network structure. So what we have here is that n_x is the number of units in the input layer, X; n_y is the number of units in the output layer. The network diagram should indicate the arrangement.

n_y comes before n_x because the network is moving in the direction from input to output. When initialise b, the bias vector, np.zeros() is called with the shape of the array as argument, where the 1 means it is a column vector.

1 Like

thanks for the clarification. :pray:

1 Like

Hi @Kic
sorry, why 0.01 is there a reason, because i thing the instructor don’t mention during the lecture.

Hi @Mohammad_Omar_Adde

Multiplying the output from random.randn() by 0.01 would scale down the values. It doesn’t change the essence of data.


There is some theoretical basis for selecting the range of the random initial values. It’s complicated, as it depends on the size of the NN, the number of layers, the numbers of units, etc. It’s an area of some research.

In practice you just try values between 0 and +1, or -1 and +1, and see how it goes. You may need to adjust the multiplier depending on your specific model.

1 Like

i really appreciate you, as mentors for your clear answers,

i tried using local machine and i have the data set there but if i try to run the code some where in the middle the two answers are not similar, the instructor’s answer and mine, even if take similar seed as the instructor, and the course is calculus week 3 notebook one linear regression

1 Like

Please post a screen capture that image that shows the results you mentioned.