W3 A1 Relu Activation doesn't work

The original assignment uses tanh activation for the first layer. I replaced it with relu activation. However, the model doesn’t want to learn. To do that I downloaded the notebook and made adjustments on my local environment.

This is what I used for relu activation:

def relu(x):
return np.maximum(x, 0)

def relu_derivative(x):
return (x>0) * 1

In forward propagation, I replaced this line of code:
A1 = np.tanh(Z1) # with tanh activation
with this line of code
A1 = relu(Z1) # with relu activation

In backward propagation, I replaced this line of code:
dZ1 = np.dot(W2.T, dZ2) * tanh_derivative(A1) # with tanh activation
with this line of code:
dZ1 = np.dot(W2.T, dZ2) * relu_derivative(A1) # with relu activation

When you use ReLU, you must increase the number of units in the hidden layer to replace a tanh() unit.

This is because tanh() is a complex smooth curve. To get the equivalent performance from ReLU, you’ll need perhaps 10 to 20 units.

3 Likes

Here’s a thread from a while back in which using ReLU for this exercise is also discussed.

Here’s a post from mentor Raymond which goes into more detail on this question.

1 Like