W 3_A1_ReLU vs tanh accuracy

When playing with different activation functions for hidden layer I found out, that ReLU gives similar accuracy on specific datasets as tanh function. But on main dataset it gives me far worse accuracy, usually around 70%, or around 80% on hidden_layer_size = 30, even though I try manipulating learning_rate and number of iterations. Is there specific reason why it performs worse for this scenario compared to tanh?

As implementation for ReLu I use (X * (X > 0)) and for derivative (1. * (X > 0)) where X is matrix.

It’s great that you are trying experiments like this. You always learn something when you try to extend the ideas in the course. Here’s another thread related to this topic from a while ago. I was able to get 81% accuracy using ReLU with n_h = 40 and some other folks were able to get 85% accuracy with ReLU.

Your implementation of ReLU and ReLU’ look correct to me, but maybe you need a higher n_h value. Note that the n_h = 4 that works pretty well with tanh gives really terrible results with ReLU.

1 Like

Thanks. With n_h = 40, learning_rate = 0.655 and iterations = 12k. I got 86% accuracy, still far from tanh performance on this dataset. Is there any concrete explanation, why it underperforms here?

Hello Lukas @Lukas_Jusko,

I have done some experiments with this dataset and the same architecture (except using different number of neurons and activations for the hidden layer). I also tried different seeds. To save my work because I am lazy :stuck_out_tongue: , I implemented my experiment with Tensorflow Keras, instead of modifying the assignment.

Hope this can be another starting point for you to further explore about neural networks.

Setting:
learning rate = 0.04
weight initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.01)
number of iteration = 150000

Activation num of neurons seed=0 seed=1 seed=2
ReLU 16 0.84 0.8275 0.8175
ReLU 64 0.885 (figure 1) 0.87 0.875
Concatenated ReLU 16 0.85 0.865 0.855
Concatenated ReLU 64 0.8975 (figure 2) 0.8825 0.885
tanh 4 0.905 0.9075

Explanation of Concatenated ReLU. It is an effect of Concatenated ReLU that it will double the number of neurons listed in the table.

Cheers,
Raymond

Figure 1
1

Figure 2
2

1 Like

Hello Lukas @Lukas_Jusko,

While my above results show that ReLU can be comparable, I wondered why ReLU took so many more neurons. After a few checks, I decided to do the following experiments, and it turns out that the (modified) ReLU can be equally good with also only 4 neurons :wink: , although this result should generate more interesting questions and needs of experiments.

Cheers,
Raymond

Setting:
learning rate = 0.1
weight initializer = tf.keras.initializers.RandomNormal(mean=0.0, stddev= 2.) # <== Note this change
number of iteration = 10000

Activation num of neurons seed=0 seed=1 seed=2
ReLU 4
Concatenated ReLU 4
tanh (model 1) 4 0.8975 0.885 0.7275
Modified ReLU (figure 3, model 2) 4 0.8975 0.645 0.895

Figure 3
3

Model 1

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_191 (Dense)           (400, 4)                  12       <-- activation ='linear'   
                                                                 
 activation_65 (Activation)  (400, 4)                  0        <-- tanh 
                                                                 
 dense_192 (Dense)           (400, 1)                  5          <-- activation ='sigmoid'  
                                                                 
=================================================================
Total params: 17
Trainable params: 17
Non-trainable params: 0
__________________________

Model 2

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_189 (Dense)           (400, 4)                  12       <-- activation ='linear' 
                                                                 
 lambda_162 (Lambda)         (400, 4)                  0     <--  lambda is x: x+1       
                                                                 
 re_lu_41 (ReLU)             (400, 4)                  0         
                                                                 
 lambda_163 (Lambda)         (400, 4)                  0     <--  lambda is x: x-1         
                                                                 
 dense_190 (Dense)           (400, 1)                  5         <-- activation ='sigmoid'  
                                                                 
=================================================================
Total params: 17
Trainable params: 17
Non-trainable params: 0

Hi @rmwkwok , very interesting! I’ve been following this thread and have learned a lot!

You mentioned that you built the model on keras. Could you share the summary of your NN? Are these just Dense layers with Relu/ConcatenatedRelu activations?

Thanks!

JC

Hello Juan! @Juan_Olano

I have updated my previous post with the summary. And yes, they are just Dense layers with ReLU/CReLU/Modified ReLU. I tried to stick with the same architecture as used in the assignment.

Cheers,
Raymond

Yeah, just saw it! Thanks! I’ll do some experiments myself here…

1 Like

Wonderful thread to follow :slight_smile: