Hello,
I’m stuck on initialize_parameters_he , my formula seems right, I’m using randn and multiplying by the provided sqrt formula - and my first layer weights seemed like they’re initialized right.
Here’s my output.
Here’s the expected output.
I don’t see how W1 can be right but not W2?
It seems like W2 weights are divided by 2 from what the actual results should be?
We are writing general code here, right? So there can be test cases which have different values as inputs in order to test your logic.
If you have hard-coded any of the values, that will fail.
But your values for W2 come out to be 2x what they should be. Seems like that’s a pretty good clue where to look for the error. How can they be right for W1 and wrong for W2? Your formula is wrong in probably two ways: how you handle the square root, the factor of 2 and which layer dimension you use as the input to the calculation. Somehow your mistakes happen to cancel each other out in the layer 1 case.
Interesting. Yes, I’d say that code looks correct. Are you sure you clicked “Shift - Enter” on the code cell before you called it again? Just typing new code and then calling the function again runs the old code. You can easily demonstrate to yourself that this is how it works.
Indeed I did. I changed around the code to check etc… (check output vs expected output, W1 is the same, W2 isn’t). Is it possible the test is broken or has been changed?
OMG…so embarrassing…That was it! Thank you! I was close to figuring out by removing the sqrt initialization and seeing that my W1 weight wasn’t changing / but W2 was still wrong.
As they say in my country, “D’oh!” But don’t feel bad: I’ve done similar things a thousand times in my programming career (so far). It’s so easy to look at a piece of code and not really “see” it.
Actually there’s another useful lesson there about python indexing. If you’ve done a lot of python programming, maybe you already knew that, but it’s perfectly valid to use negative array indices. It just counts backward from the end of the array: myArray[-1] gives you the last element, myArray[-2] the second to last and so forth.