Derivative of Relu in output layer

I just removed the ReLU from the output layer and used leaky_relu. But I got the same result, I guess, it is because I did not define the leaky_relu_backward and instead, use relu_backward.
I copied the Relu Backward’s code below but what will be the leaky_relu_backward? Derivative of A = np.maximum(0.01*Z,Z)?

  def relu_backward(dA, cache):
      Z = cache
      dZ = np.array(dA, copy=True) # just converting dz to a correct object.
      dZ[Z <= 0] = 0
        
      assert (dZ.shape == Z.shape)
        
      return dZ

Do you mean remove the relu and just use linear function, like below?
Z^l = A^l = np.dot(W,A^{l-1}) + b^l)

I sent it to you, Raymond.

In the last, I would like to thank all of you for your unbounded and limitless time and guidance. Highly indebted to you.
Saif.

1 Like

Thank you @saifkhanengr.

1 Like

I spot something quickly. Lucky me :stuck_out_tongue_winking_eye:

I added these print lines

def linear_forward(A, W, b):
    ..... # your code
    print('linear_forward A', A.shape)
    print('linear_forward W', W.shape)
    print('linear_forward b', b.shape)
    print('linear_forward Z', Z.shape)
    return Z, cache

and found these:

linear_forward A (20, 1)
linear_forward W (7, 20)
linear_forward b (7, 1)
linear_forward Z (7, 1)

The thing is, the shape of your W should be (number of features, number of neurons) which means (1, 7), although my preference is (number of neurons, number of features).

I will hold on reading your notebook, because it seems you have something to change and check. Checking shapes can be an interesting exercise :wink:

Cheers,
Raymond

2 Likes

Btw, (number of neurons, number of features) is meant for the first dense layer. The second dense layer should have (number of neurons in this layer, number of neurons in the last layer)

2 Likes

Yes, my W’s and b’s are (number of neurons, number of features) and (number of neurons, 1) respectively.

BTW, waiting for you to read the whole notebook.

1 Like

The shape of your W in the first dense layer is linear_forward W (7, 20). 20 is the number of samples. Did I misunderstand anything? Or would you like to check all the shapes in your notebook?

1 Like

I just check all the shapes of my file. They are:
print(f"shape of X is {X.shape}“)
print(f"shape of Y is {Y.shape}”)
print(f"number of neurons in hidden layer is {n_h}“)
print(f"shape of W1 is {W1.shape}”)
print(f"shape of b1 is {b1.shape}“)
print(f"shape of A1 is {A1.shape}”)
print(f"shape of W2 is {W2.shape}“)
print(f"shape of b2 is {b2.shape}”)
print(f"shape of A2 is {A2.shape}")

shape of X is (20, 1)
shape of Y is (20, 1)
number of neurons in hidden layer is 7
shape of W1 is (7, 20)
shape of b1 is (7, 1)
shape of A1 is (7, 1)
shape of W2 is (20, 7)
shape of b2 is (20, 1)
shape of A2 is (20, 1)

All good or not?

1 Like

What are the meaning of 7 and 20? Number of features? Number of samples? Number of neurons?

1 Like

7 is the number of neurons and 20 is the number of samples (rows of input).

1 Like

OK, but you said this which is correct but contradicting to your printing result. Do you see what you need to change?

1 Like

Oh. Number of features is 1 (column of X). So, I need to change that. Currently, I am using (number of neurons, number of samples) = (7,20) for W1 but it should be (number of neurons, number of features) = (7,1). Right?

1 Like

Exactly! And also your W2 and b2, they have problem too.

1 Like

Doing this gives error

1 Like

Saif, I have been following this conversation and seen a lot of your effort. I can guess the error you are seeing, but I hope you will try googling the error message first and debug it yourself. Some said writing the code takes only 30% of the whole coding time but debugging takes 70%. I don’t want to do that major work for you.

Now you know the right thing to do, so it’s time to do it right. I know you will try it first, right?

Raymond

2 Likes

I highly appreciate your time and efforts, Raymond. Thanks a lot. I love to learn to fish instead of taking fish. Thanks, man…

1 Like

We are the same :wink: Take your time debugging, and I am following this topic. Just please make sure all shapes are correct :wink:

2 Likes

Sure… Will update here.

1 Like

Yes, that was what I was recommending. You can try LeakyReLU as well, but I think it’s worth trying just omitting the output activation function altogether.

If you implement LeakyReLU, here’s one way to code its derivative:

def leakyreluprime(Z, slope = 0.05):
    G = np.where(Z > 0, 1, slope)
    return G

Of course that is implemented as a separate function call. If you build it “in situ” by analogy to the way relu_backward works, you’re doing two things at once:

dZ = dA * g'(Z)

But the same idea can be adapted …

One point to emphasize here is that if you just duplicate the code in relu_backward to make leaky_relu_backward, be sure to understand the importance of the way they implemented this line:

If you “short-circuit” that by eliminating the “copy” there:

dZ = dA

that is a disaster, because you’re about to overwrite some of the values in dZ. Because of the way that parameter passing works in python and the way object assignments work, doing it without the copy modifies the global value of dA. See this post and this later reply on that thread.

3 Likes

Hello @paulinpaloalto @rmwkwok @Rashmi! I hope you are doing well. Every time I ask silly questions and you consistently guide me. Thank you for that.

I am omitting the relu activation function in the last layer. I copied many functions (from DLS course 1, week 3 and 4 assignments), so, for simplicity, I defined a new function named no_relu and then just changed the name in other functions.
This is how I defined that:

def no_relu(Z):
    A = Z
    assert(A.shape == Z.shape)
    cache = Z 
    return A, cache

Now I need to define its derivative too.
I did like this:

def no_relu_backward(dA, cache):
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    dZ = 1
    assert (dZ.shape == Z.shape)
    return dZ

Just making it 1. Is it correct or not? I am getting error at that point.

Your most unintelligent student,
Saif.

2 Likes

Hello SaifKhanengr

It’s good to see you experimenting in different ways! Initially you tried ReLu and then LeakyRelu, where you received almost the same results. Now, you have omitted the last layer (outer activation function). But what was your purpose to check these experiments? If it is to check the binary classification, then how can you just build on 1 as an output? It always checks between 0 and 1.
If you are doing so, then what about 0? And that makes me wonder, how dZ=1 will provide you actual results?
You can expand it beyond to check more of it. Thanks!

1 Like