Good day!
So I’m finally getting around to typing out my notes on programming assignment 2, of week 4.
Note 1
In “4 - Two-layer Neural Network” “Exercise 1 - two_layer_model”
Tell the student to
DO USE
(n_x, n_h, n_y) = layers_dims
parameters = initialize_parameters(n_x, n_h, n_y)
AND DO NOT USE
parameters = initialize_parameters_deep(layers_dims)
The text says to use the first, but it doesn’t say why (so, yes, I used the other one).
Both of these functions should be computationally equivalent, right? NO! Because they depend on hidden state, namely the state of the random number generator:
def initialize_parameters(n_x: int, n_h: int, n_y: int) -> Dict[str, np.ndarray]:
init_scale: float = 0.01
np.random.seed(1)
# random numbers are generated based on the global RNG
def initialize_parameters_deep(layer_dims: List[int]) -> Dict[str, np.ndarray]:
init_scale: float = 0.01
np.random.seed(3)
# random numbers are generated based on the global RNG
If one uses the “wrong” function, the random number stream will be not as expected by the tests. The tests will fail (quite mysteriously, too).
There should be a note in the exercise text regarding the fact that using the alternative function, nominally “computationally equivalent” will fail the test
Taking a step back, and reflecting on how things should be done: if the (global) random number generator state is somehow important, one would like to be explicit. Make it local and pass it as parameter:
def initialize_parameters(n_x: int, n_h: int, n_y: int,
rand: numpy_random.Generator)
-> Dict[str, np.ndarray]:
init_scale: float = 0.01
# use the local "rand" RNG to generate random numbers as needed
def initialize_parameters_deep(layer_dims: List[int],
rand: numpy_random.Generator)
-> Dict[str, np.ndarray]:
init_scale: float = 0.01
# use the local "rand" RNG to generate random numbers as needed
This would also make the test structure viable. There are initializations of the RNG in the testing code which step on each other’s feet, it’s really weird.
Addendum: here is another one…
def L_layer_model(X, Y, layers_dims, learning_rate=0.0075, num_iterations=3000, print_cost=False):
np.random.seed(1)
Note 2
In “4.1 - Train the model”
there is a call to
plot_costs(costs, learning_rate)
Maybe I’m confused somehow but that function wasn’t found. I had to add it myself from week 2 I think.
def plot_costs(costs, learning_rate):
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
Note 3
In the two_layer_model()
, the “cost” may have to be squeezed before printing.
print(“Cost after iteration {}: {}”.format(i, np.squeeze(cost)))