I’m struggling with the definition of the step function in CNQ4. I have been over the other posts in the forum, and while they’ve provided some insight (such as the step-function already being defined in the given code), I’m confused on one particular point.

# if z1 < 0, then l1 = 0
# otherwise l1 = l1
# (this is already implemented for you)
l1[z1 < 0] = 0 # use "l1" to compute gradients below

I am assuming the last line “l1[z1 < 0] = 0” implements the step function. From my understanding, this function is modifying the values in l1 by replacing them with 0 whenever z1 < 0. z1 itself remains unmodified.

In that case, is step(z1) referring to the updated l1, or is it referring to z1 (which hasn’t changed)? When the comment in the code line suggests to use ‘l1’ to compute the gradients below, I’m not sure whether it means in in lieu of the original definition of l1 (W2^T(Yhat-Y)), or as a substitute for step(z1).

I could be misunderstanding the step function and which values are changing as I’m a novice python user, so I would appreciate any clarification. I have actually tried calculating grad_W1 with every iteration of z1, l1, W2.T * Yhat-Y, etc., but I’m not able to get it to work.

The step function is a type of activation function used in neural networks. It’s a simple function that outputs either 0 or 1 based on the input. In this context, z1 is typically the input to the step function, and l1 is the output after applying the step function.

The line l1[z1 < 0] = 0 is implementing the step function. What this does is it goes through each element of z1 and checks if it is less than 0. If it is, the corresponding element in l1 is set to 0. This effectively applies the step function to each element of z1, with l1 storing the result.

Regarding your question about step(z1) referring to l1 or z1, in this context, step(z1) is the operation you’re performing and the result of this operation is stored in l1. So, l1 is the output of step(z1). The original z1 remains unchanged; you’re just using its values to determine what l1 should be.

When the code comments mention using l1 to compute gradients, it means that you should use the updated l1 values (after applying the step function) in your gradient calculations. This is because the gradients depend on the outputs of the activation function (in this case, the step function).

So, in summary:

l1[z1 < 0] = 0 is applying the step function to z1, storing the result in l1.

step(z1) refers to the operation being performed, with l1 being the result.

For gradient computations, use the updated l1 which now contains the outputs of the step function applied to z1.

This was VERY helpful! I was interpreting l1 . step(z1) as TWO different elements…I now see they are to be treated as just one (aka, the updated l1). I removed the extra z1 and voila, all tests pass. Thank you!