Derivative of Relu in output layer

saifkhanengr · November 24, 2022, 4:04am

I just removed the ReLU from the output layer and used leaky_relu. But I got the same result, I guess, it is because I did not define the leaky_relu_backward and instead, use relu_backward.
I copied the Relu Backward’s code below but what will be the leaky_relu_backward? Derivative of A = np.maximum(0.01*Z,Z)?

  def relu_backward(dA, cache):
      Z = cache
      dZ = np.array(dA, copy=True) # just converting dz to a correct object.
      dZ[Z <= 0] = 0
        
      assert (dZ.shape == Z.shape)
        
      return dZ

Do you mean remove the relu and just use linear function, like below?
Z^l = A^l = np.dot(W,A^{l-1}) + b^l)

I sent it to you, Raymond.

In the last, I would like to thank all of you for your unbounded and limitless time and guidance. Highly indebted to you.
Saif.

rmwkwok · November 24, 2022, 5:00am

Thank you @saifkhanengr.

rmwkwok · November 24, 2022, 5:09am

I spot something quickly. Lucky me

I added these print lines

def linear_forward(A, W, b):
    ..... # your code
    print('linear_forward A', A.shape)
    print('linear_forward W', W.shape)
    print('linear_forward b', b.shape)
    print('linear_forward Z', Z.shape)
    return Z, cache

and found these:

linear_forward A (20, 1)
linear_forward W (7, 20)
linear_forward b (7, 1)
linear_forward Z (7, 1)

The thing is, the shape of your W should be (number of features, number of neurons) which means (1, 7), although my preference is (number of neurons, number of features).

I will hold on reading your notebook, because it seems you have something to change and check. Checking shapes can be an interesting exercise

Cheers,
Raymond

rmwkwok · November 24, 2022, 5:14am

Btw, (number of neurons, number of features) is meant for the first dense layer. The second dense layer should have (number of neurons in this layer, number of neurons in the last layer)

saifkhanengr · November 24, 2022, 5:32am

Yes, my W’s and b’s are (number of neurons, number of features) and (number of neurons, 1) respectively.

BTW, waiting for you to read the whole notebook.

rmwkwok · November 24, 2022, 5:53am

The shape of your W in the first dense layer is linear_forward W (7, 20). 20 is the number of samples. Did I misunderstand anything? Or would you like to check all the shapes in your notebook?

saifkhanengr · November 24, 2022, 7:34am

I just check all the shapes of my file. They are:
print(f"shape of X is {X.shape}“)
print(f"shape of Y is {Y.shape}”)
print(f"number of neurons in hidden layer is {n_h}“)
print(f"shape of W1 is {W1.shape}”)
print(f"shape of b1 is {b1.shape}“)
print(f"shape of A1 is {A1.shape}”)
print(f"shape of W2 is {W2.shape}“)
print(f"shape of b2 is {b2.shape}”)
print(f"shape of A2 is {A2.shape}")

shape of X is (20, 1)
shape of Y is (20, 1)
number of neurons in hidden layer is 7
shape of W1 is (7, 20)
shape of b1 is (7, 1)
shape of A1 is (7, 1)
shape of W2 is (20, 7)
shape of b2 is (20, 1)
shape of A2 is (20, 1)

All good or not?

rmwkwok · November 24, 2022, 7:37am

What are the meaning of 7 and 20? Number of features? Number of samples? Number of neurons?

saifkhanengr · November 24, 2022, 7:42am

7 is the number of neurons and 20 is the number of samples (rows of input).

rmwkwok · November 24, 2022, 7:44am

OK, but you said this which is correct but contradicting to your printing result. Do you see what you need to change?

saifkhanengr · November 24, 2022, 7:47am

Oh. Number of features is 1 (column of X). So, I need to change that. Currently, I am using (number of neurons, number of samples) = (7,20) for W1 but it should be (number of neurons, number of features) = (7,1). Right?

rmwkwok · November 24, 2022, 7:47am

Exactly! And also your W2 and b2, they have problem too.

saifkhanengr · November 24, 2022, 7:48am

Doing this gives error

rmwkwok · November 24, 2022, 7:56am

Saif, I have been following this conversation and seen a lot of your effort. I can guess the error you are seeing, but I hope you will try googling the error message first and debug it yourself. Some said writing the code takes only 30% of the whole coding time but debugging takes 70%. I don’t want to do that major work for you.

Now you know the right thing to do, so it’s time to do it right. I know you will try it first, right?

Raymond

saifkhanengr · November 24, 2022, 8:01am

I highly appreciate your time and efforts, Raymond. Thanks a lot. I love to learn to fish instead of taking fish. Thanks, man…

rmwkwok · November 24, 2022, 8:02am

We are the same Take your time debugging, and I am following this topic. Just please make sure all shapes are correct

saifkhanengr · November 24, 2022, 8:05am

Sure… Will update here.

paulinpaloalto · November 24, 2022, 5:33pm

Yes, that was what I was recommending. You can try LeakyReLU as well, but I think it’s worth trying just omitting the output activation function altogether.

If you implement LeakyReLU, here’s one way to code its derivative:

def leakyreluprime(Z, slope = 0.05):
    G = np.where(Z > 0, 1, slope)
    return G

Of course that is implemented as a separate function call. If you build it “in situ” by analogy to the way relu_backward works, you’re doing two things at once:

dZ = dA * g'(Z)

But the same idea can be adapted …

One point to emphasize here is that if you just duplicate the code in relu_backward to make leaky_relu_backward, be sure to understand the importance of the way they implemented this line:

If you “short-circuit” that by eliminating the “copy” there:

dZ = dA

that is a disaster, because you’re about to overwrite some of the values in dZ. Because of the way that parameter passing works in python and the way object assignments work, doing it without the copy modifies the global value of dA. See this post and this later reply on that thread.

saifkhanengr · November 29, 2022, 11:02am

Hello @paulinpaloalto @rmwkwok @Rashmi! I hope you are doing well. Every time I ask silly questions and you consistently guide me. Thank you for that.

I am omitting the relu activation function in the last layer. I copied many functions (from DLS course 1, week 3 and 4 assignments), so, for simplicity, I defined a new function named no_relu and then just changed the name in other functions.
This is how I defined that:

def no_relu(Z):
    A = Z
    assert(A.shape == Z.shape)
    cache = Z 
    return A, cache

Now I need to define its derivative too.
I did like this:

def no_relu_backward(dA, cache):
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    dZ = 1
    assert (dZ.shape == Z.shape)
    return dZ

Just making it 1. Is it correct or not? I am getting error at that point.

Your most unintelligent student,
Saif.

Rashmi · November 29, 2022, 12:20pm

Hello SaifKhanengr

It’s good to see you experimenting in different ways! Initially you tried ReLu and then LeakyRelu, where you received almost the same results. Now, you have omitted the last layer (outer activation function). But what was your purpose to check these experiments? If it is to check the binary classification, then how can you just build on 1 as an output? It always checks between 0 and 1.
If you are doing so, then what about 0? And that makes me wonder, how dZ=1 will provide you actual results?
You can expand it beyond to check more of it. Thanks!

Topic		Replies	Views
What is the role of ReLu derivative? Neural Networks and Deep Learning week-module-3 , coursera-platform	3	292	May 4, 2024
Week 4, Last assignment / General question Neural Networks and Deep Learning coursera-platform	2	538	December 5, 2021
Clarification of the Derivative of the Log Loss Function Neural Networks and Deep Learning coursera-platform	2	966	April 17, 2022
Backpropagation formulas Neural Networks and Deep Learning coursera-platform	7	1055	April 21, 2021
Week4- assignment 2- Difference in gradient calculation for the last layer activation in neural networks Neural Networks and Deep Learning coursera-platform	2	677	May 17, 2023

Derivative of Relu in output layer

Related topics