W3 A1 Relu Activation doesn't work

jakhon77 · October 25, 2024, 3:35am

The original assignment uses tanh activation for the first layer. I replaced it with relu activation. However, the model doesn’t want to learn. To do that I downloaded the notebook and made adjustments on my local environment.

This is what I used for relu activation:

def relu(x):
return np.maximum(x, 0)

def relu_derivative(x):
return (x>0) * 1

In forward propagation, I replaced this line of code:
A1 = np.tanh(Z1) # with tanh activation
with this line of code
A1 = relu(Z1) # with relu activation

In backward propagation, I replaced this line of code:
dZ1 = np.dot(W2.T, dZ2) * tanh_derivative(A1) # with tanh activation
with this line of code:
dZ1 = np.dot(W2.T, dZ2) * relu_derivative(A1) # with relu activation

TMosh · October 25, 2024, 6:27am

When you use ReLU, you must increase the number of units in the hidden layer to replace a tanh() unit.

This is because tanh() is a complex smooth curve. To get the equivalent performance from ReLU, you’ll need perhaps 10 to 20 units.

paulinpaloalto · October 29, 2024, 1:23am

Here’s a thread from a while back in which using ReLU for this exercise is also discussed.

Here’s a post from mentor Raymond which goes into more detail on this question.

Topic		Replies	Views
How to apply relu function in Exercise of week 3(optional).) Neural Networks and Deep Learning coursera-platform	5	540	July 12, 2023
Course1 - Week3 Assignment - ReLU gave worse performance than tanh Neural Networks and Deep Learning coursera-platform	3	550	September 9, 2021
W3_A1_ReLu as Activation function Neural Networks and Deep Learning coursera-platform	3	623	March 30, 2023
Using ReLU instead of tanh in DLS 1 week 3 Neural Networks and Deep Learning coursera-platform	1	493	March 15, 2022
ReLU and sigmoid alternatives in Week 3 assignment Neural Networks and Deep Learning coursera-platform	11	887	July 20, 2022

W3 A1 Relu Activation doesn't work

Related topics