The question related to whether we need to know the activation function in given layer when we compute back propagation is a little misleading. I chose no, because we need to know the derivative, not the function. However, this was marked incorrect and the explanation was that we need to know the function in order to compute the derivative. Well, that is true. But I was afraid that if I choose “false”, it will be incorrect because during the implementation, you just use the derivative. I feel both true and false are reasonable answers. I hope I make sense :). (It is similar to saying that when you compute activation in layer 6, you need to know the original values X… well, yes and no. Originally yes, but once you have values from layer 5, you do not need X…)

But the point is you need to know what the activation function is in order to know what the derivative of it is, right? They don’t just give you the derivative. You have to write the code to implement it, at least here in DLS Course 1. You saw how that worked in the programming assignment in Week 3. Eventually we’ll graduate to using TensorFlow and it just takes care of all the back prop code “magically”. But this is still Course 1.

Or to put it another way, the point is that we are making a “hyperparameter” choice here when we decide the activation function for a given layer. And that is the choice we make: the activation function. Then that decision has consequences, e.g. we then need to know the derivative of the chosen function in order to implement back prop. That’s the way to think about it: how the choice is made.

Thank you for the answer! You’re absolutely right. It just felt like there’s a catch, you know. I wasn’t sure if I am being tested on whether I know that we use derivative in backward propagation or whether I am being tested on the whole process of knowing how your model (=> activation function) looks like.

Yes, but my interpretation is that it’s not a catch, because both of those interpretations have the same answer. Even if you know that the derivative is involved in back propagation, you still need to know which function you are going to differentiate, right?