Trying to replicate Week 4 deep network

ales.veshtort · October 11, 2023, 1:44pm

Hi, guys, I’m basically trying to replicate the same ReLU → Sigmoid binary classifier from the Week 4 programming assignment on my own data and encountered a number of problems.

ReLU derivative implementation is supplied at the assignment, but is it correct that ReLU(a * x)’ = a?
After a number of iterations (2 or 3) I’m getting a very high ReLU output (10.0+) which makes sigmoid output very close to 1, which subsequantely makes log(1-y) = -inf in loss function. Is it something wrong with my backprop or should I somehow limit ReLU?

Thanks in advance!

TMosh · October 11, 2023, 1:50pm

I don’t understand the question.
Try normalizing the data set.

ales.veshtort · October 11, 2023, 1:58pm

derivative of f(a * x) equals to a * dx where a is const, and f is ReLU
Data set is normalized, weights are initialized to random small values

TMosh · October 11, 2023, 2:49pm

Perhaps ReLU is not a good choice of activation function for that data set.
Try using sigmoid() in the hidden layer instead.

Or, perhaps your learning rate is too high.

paulinpaloalto · October 11, 2023, 3:56pm

As you say, they gave you all the logic to deal with backprop for ReLU in the hidden layers and sigmoid at the output layer. Of course if you alter the architecture as Tom suggests to use sigmoid also in the hidden layers, then you need to make sure to adjust the backprop logic.

The derivative if ReLU is most compactly expressed as:

g'(Z) = (Z > 0)

So you get 1 for all values of Z > 0 and 0 for all values of Z \leq 0 although technically the derivative of ReLU is undefined at Z = 0.

ales.veshtort · October 11, 2023, 11:19pm

Already tried to adjust learning rate, didn’t help. Is there special cases, when ReLU is not a good choice? I’ve thought that ReLU is almost always a good choice except for a output layer for classification task

ales.veshtort · October 11, 2023, 11:23pm

I’m not using code from the lab per se, I’m just trying to replicate the algorithm as I’ve understood it. Thanks for the tip, I guess I just used linear function derivative instead of ReLU’s, I’ll try adjust my code and see if that will help

paulinpaloalto · October 11, 2023, 11:27pm

You can also examine their code for relu_backward and notice that they are rolling several computations into that one function, not just the derivative of ReLU.

TMosh · October 11, 2023, 11:30pm

ReLU isn’t a very good activation function, because it’s non-linearity is mostly trivial.

It’s also prone to getting stuck, because its output is 0 for all negative inputs.

The only advantage ReLU has is that the gradients are very easy to compute.

But because of the “dead unit” syndrome, you need a lot more ReLU units to do the job of a single sigmoid() or tanh() unit.

paulinpaloalto · October 12, 2023, 12:51am

I don’t think you can make that general a statement. But the standard practice is to try ReLU as your first choice for the hidden layer activations. That is because it is by far the cheapest to compute of any activation function. So if it works, it’s a big win. Why wouldn’t you choose that if it works? But it doesn’t always work. If not, then you try Leaky ReLU, which is almost as cheap as ReLU and doesn’t have the “dead neuron” problem. If that also doesn’t give you good results, then you graduate to more expensive functions like tanh, sigmoid and swish.

rmwkwok · October 12, 2023, 2:26am

Hi @ales.veshtort

Yes, when x is positive.

To find out if something’s wrong with your backprop, you need to inspect all those intermediate variables along the way and see if they all match with your expectation. Here is how I would do it:

Use Tensorflow and see if it reproduces a similar trend. Make sure everything is as close as possible.
Inspecting intermediate variables is a labour work, but to make it lighter, I would adjust my architecture to as simple as possible, then verify that simple architecture still has the same problem, and then start the inspection.

If your backprop is OK, then you know what to expect from the above two works.

If I were you, I would not consider any other activation unless I had made sure my backprop is OK, because I would want to build on a solid foundation.

Good luck!
Raymond

ales.veshtort · October 12, 2023, 2:31pm

Thanks, everyone! I’ll try the approaches you suggested, thanks again for quick support!

Topic		Replies	Views
Week 4, Last assignment / General question Neural Networks and Deep Learning	2	538	December 5, 2021
Want to discuss about My code Neural Networks and Deep Learning	7	568	May 21, 2022
ReLU and sigmoid alternatives in Week 3 assignment Neural Networks and Deep Learning	11	886	July 20, 2022
Week 04 - Building your Deep Neural Network for Regression Neural Networks and Deep Learning	3	409	August 26, 2023
Course1 - Week3 Assignment - ReLU gave worse performance than tanh Neural Networks and Deep Learning	3	550	September 9, 2021

Trying to replicate Week 4 deep network

Related topics