The original assignment uses tanh activation for the first layer. I replaced it with relu activation. However, the model doesn’t want to learn. To do that I downloaded the notebook and made adjustments on my local environment.
This is what I used for relu activation:
def relu(x):
return np.maximum(x, 0)
def relu_derivative(x):
return (x>0) * 1
In forward propagation, I replaced this line of code:
A1 = np.tanh(Z1) # with tanh activation
with this line of code
A1 = relu(Z1) # with relu activation
In backward propagation, I replaced this line of code:
dZ1 = np.dot(W2.T, dZ2) * tanh_derivative(A1) # with tanh activation
with this line of code:
dZ1 = np.dot(W2.T, dZ2) * relu_derivative(A1) # with relu activation