W 3_A1_ReLU vs tanh accuracy

Lukas_Jusko · November 11, 2022, 6:35pm

When playing with different activation functions for hidden layer I found out, that ReLU gives similar accuracy on specific datasets as tanh function. But on main dataset it gives me far worse accuracy, usually around 70%, or around 80% on hidden_layer_size = 30, even though I try manipulating learning_rate and number of iterations. Is there specific reason why it performs worse for this scenario compared to tanh?

As implementation for ReLu I use (X * (X > 0)) and for derivative (1. * (X > 0)) where X is matrix.

paulinpaloalto · November 11, 2022, 8:21pm

It’s great that you are trying experiments like this. You always learn something when you try to extend the ideas in the course. Here’s another thread related to this topic from a while ago. I was able to get 81% accuracy using ReLU with n_h = 40 and some other folks were able to get 85% accuracy with ReLU.

Your implementation of ReLU and ReLU’ look correct to me, but maybe you need a higher n_h value. Note that the n_h = 4 that works pretty well with tanh gives really terrible results with ReLU.

Lukas_Jusko · November 11, 2022, 10:04pm

Thanks. With n_h = 40, learning_rate = 0.655 and iterations = 12k. I got 86% accuracy, still far from tanh performance on this dataset. Is there any concrete explanation, why it underperforms here?

rmwkwok · November 12, 2022, 11:33am

Hello Lukas @Lukas_Jusko,

I have done some experiments with this dataset and the same architecture (except using different number of neurons and activations for the hidden layer). I also tried different seeds. To save my work because I am lazy , I implemented my experiment with Tensorflow Keras, instead of modifying the assignment.

Hope this can be another starting point for you to further explore about neural networks.

Setting:
learning rate = 0.04
weight initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.01)
number of iteration = 150000

Activation	num of neurons	seed=0	seed=1	seed=2
ReLU	16	0.84	0.8275	0.8175
ReLU	64	0.885 (figure 1)	0.87	0.875
Concatenated ReLU	16	0.85	0.865	0.855
Concatenated ReLU	64	0.8975 (figure 2)	0.8825	0.885
tanh	4	0.905	0.9075

Explanation of Concatenated ReLU. It is an effect of Concatenated ReLU that it will double the number of neurons listed in the table.

Cheers,
Raymond

Figure 1

Figure 2

rmwkwok · November 12, 2022, 2:17pm

Hello Lukas @Lukas_Jusko,

While my above results show that ReLU can be comparable, I wondered why ReLU took so many more neurons. After a few checks, I decided to do the following experiments, and it turns out that the (modified) ReLU can be equally good with also only 4 neurons , although this result should generate more interesting questions and needs of experiments.

Cheers,
Raymond

Setting:
learning rate = 0.1
weight initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev= 2.) # <== Note this change
number of iteration = 10000

Activation	num of neurons	seed=0	seed=1	seed=2
ReLU	4
Concatenated ReLU	4
tanh (model 1)	4	0.8975	0.885	0.7275
Modified ReLU (figure 3, model 2)	4	0.8975	0.645	0.895

Figure 3

Model 1

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_191 (Dense)           (400, 4)                  12       <-- activation ='linear'   
                                                                 
 activation_65 (Activation)  (400, 4)                  0        <-- tanh 
                                                                 
 dense_192 (Dense)           (400, 1)                  5          <-- activation ='sigmoid'  
                                                                 
=================================================================
Total params: 17
Trainable params: 17
Non-trainable params: 0
__________________________

Model 2

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_189 (Dense)           (400, 4)                  12       <-- activation ='linear' 
                                                                 
 lambda_162 (Lambda)         (400, 4)                  0     <--  lambda is x: x+1       
                                                                 
 re_lu_41 (ReLU)             (400, 4)                  0         
                                                                 
 lambda_163 (Lambda)         (400, 4)                  0     <--  lambda is x: x-1         
                                                                 
 dense_190 (Dense)           (400, 1)                  5         <-- activation ='sigmoid'  
                                                                 
=================================================================
Total params: 17
Trainable params: 17
Non-trainable params: 0

Juan_Olano · November 12, 2022, 2:51pm

Hi @rmwkwok , very interesting! I’ve been following this thread and have learned a lot!

You mentioned that you built the model on keras. Could you share the summary of your NN? Are these just Dense layers with Relu/ConcatenatedRelu activations?

Thanks!

JC

rmwkwok · November 12, 2022, 2:58pm

Hello Juan! @Juan_Olano

I have updated my previous post with the summary. And yes, they are just Dense layers with ReLU/CReLU/Modified ReLU. I tried to stick with the same architecture as used in the assignment.

Cheers,
Raymond

Juan_Olano · November 12, 2022, 2:59pm

Yeah, just saw it! Thanks! I’ll do some experiments myself here…

Rashmi · November 13, 2022, 8:35am

Wonderful thread to follow

Topic		Replies	Views
Week 3, Programming assignment: how were your performances of sigmoid or ReLu? Neural Networks and Deep Learning coursera-platform	1	634	December 6, 2021
Course1 - Week3 Assignment - ReLU gave worse performance than tanh Neural Networks and Deep Learning coursera-platform	3	550	September 9, 2021
ReLU and sigmoid alternatives in Week 3 assignment Neural Networks and Deep Learning coursera-platform	11	887	July 20, 2022
How to apply relu function in Exercise of week 3(optional).) Neural Networks and Deep Learning coursera-platform	5	540	July 12, 2023
W3_A1_ReLu as Activation function Neural Networks and Deep Learning coursera-platform	3	623	March 30, 2023

W 3_A1_ReLU vs tanh accuracy

Related topics