The original assignment uses tanh activation for the first layer. I replaced it with relu activation. However, the model doesn’t want to learn. To do that I downloaded the notebook and made adjustments on my local environment.

This is what I used for relu activation:

def relu(x):

return np.maximum(x, 0)

def relu_derivative(x):

return (x>0) * 1

In forward propagation, I replaced this line of code:

A1 = np.tanh(Z1) # with tanh activation

with this line of code

A1 = relu(Z1) # with relu activation

In backward propagation, I replaced this line of code:

dZ1 = np.dot(W2.T, dZ2) * tanh_derivative(A1) # with tanh activation

with this line of code:

dZ1 = np.dot(W2.T, dZ2) * relu_derivative(A1) # with relu activation