Copy of block [13] of the lab in question follows below. Running the block results in “None” being printed as result of the print(d2y_dx2) statment.
x = tf.Variable(1.0)
# Setting persistent=True still won't work
with tf.GradientTape(persistent=True) as tape_2:
# Setting persistent=True still won't work
with tf.GradientTape(persistent=True) as tape_1:
y = x * x * x
# The first gradient call is outside the outer with block
# so the tape will expire after this
dy_dx = tape_1.gradient(y, x)
# the output will be `None`
d2y_dx2 = tape_2.gradient(dy_dx, x)
print(dy_dx)
print(d2y_dx2)
The explanation given above the block is that… “Notice how the d2y_dx2 calculation is now None . The tape has expired.” But expired tape is not the explanation for why d2y_dx2 is not calculated correctly.
And the comment “The first gradient call is outside the outer with block so the tape will expire after this” is a correct statement, but it is not relevant to the behavior observed.
The reason d2y_dx2 evaluates to None is not because tape_1 expired before d2y_dx2 = tape_2.gradient(dy_dx, x) is evaluated. Rather, the problem is that dy_dx = tape_1.gradient(y, x) is not run inside tape_2’s with block so the variables are not watched by tape_2 when dy_dx is set.
This can be verified by editing the code so that dy_dx is set and tape_1 is deleted within tape_2’s with block:
x = tf.Variable(1.0)
with tf.GradientTape(persistent=True) as tape_2:
with tf.GradientTape(persistent=True) as tape_1:
y = x * x * x
dy_dx = tape_1.gradient(y, x)
del tape_1
d2y_dx2 = tape_2.gradient(dy_dx, x)
del tape_2
print(dy_dx)
print(d2y_dx2)
Running this modified block results in the following output, indicating d2y_dx2 is calculated as expected even though tape_1 has been deleted:
in the second scenario you mentioned where d2y_dx2 showed with shape even after deletion of tape1, has a different positioning of dy_dx, but the explanation you are pointing has dy_dx indentation placement different.
I am sharing a screenshot, you can compare with your explanation
The indentation of the code as I edited it is intentionally different. This is to ensure the relevant operations performed on variables are recorded within the runtime context for the GradientTape context manager object tape_2. dy2_dx2 = tape_2.gradient(dy_dx, x) is evaluated successfully despite tape_1 being discarded explicitly, showing that expiration of tape_1 is not the reason example code as shown in lab fails to evaluate. The reason for the failure is because the operation that generates dy_dx is not watched by tape_2 because the operation is not performed within the runtime context of tape_2 (the with block associated with tape_2).
Please refer to the following passage from documentation for GradientTape:
Operations are recorded if they are executed within this context manager and at least one of their inputs is being “watched”.
Trainable variables (created by tf.Variable or tf.compat.v1.get_variable, where trainable=True is default in both cases) are automatically watched. Tensors can be manually watched by invoking the watch method on this context manager.
as you mentioned the indentation causes difference in operational calculation performed in variables, the same is being mentioned in the earlier calculation which you have mentioned but when it came to raising your concern that it does calculate the tapes even if tape1 is deleted doesn’t fit in as per section of assignment screenshots I shared.
you are mixing two things where as in assignment the variables indentation inside the block will give you the shape out but when it comes outside the block it doesn’t persist for the second calculation.
What I am trying to point your calculations isn’t wrong, but you raised a query that outside block calculation does still calculates even if tape 1 is deleted is because you change the indentation but in the assignment where this instruction was given, the indentation was not where you have edited.
Sorry, I’m not sure what you’re getting at. All I’m doing with the edited code is demonstrating that the explanation given in the lab for why the original code doesn’t correctly calculate the second derivative is wrong.
yes but your reasoning comes with indentation editing where as in the lab indentation doesn’t match to your explanation, this is what I am explaining as you pointed the reasoning is incorrect but you didn’t mentioned in your created post that you changed the indentation the first gradient call where as in the lab where as in the lab where it mentions about d2y_dx2 being None even after using Persistent=True, still places first gradient call outside the loop. So when you edited the codes by placing the first gradient call, you weren’t following the instructions which mentions in highlights (where not to indent the first gradient calculation)
Are you saying that we should not edit the code block the way I edited it in my first comment in this thread? Because I agree with that statement. I’m not saying that the lab’s code should be changed the way I edited the code.
I’m saying that the comments in the code block should be changed so the they state the correct reason why the original code doesn’t work. Because otherwise students taking the course will come away with wrong understanding of how GradientTape works.
The edited code is included in this thread just to demonstrate that the explanation is wrong.
You are still not getting my point, the explanation where you had issue mentioning that it mentions the tape gradient do not come as None with explanation and editing in code you did, but if you see in the assignment from you are referring the mentioned tape gradient coming to None is when the first gradient calculation is not in the loop and when you edited with the explanation mentioning that tape gradient did come out with shape, you mentioned it inside the loops, so basically you are not following the instructions header where it mentions where not to indent first gradient calculation you are not suppose place the first gradient calculation inside the loop, forget about editing the code that is secondary issue and ofcourse the upgraded lab are code written and is given for learner to understand the codes by just running the cells
I think we’re going in circles by focusing on the edited code.
Please just focus on the fact that the comments in the original code is giving students the wrong explanation for why the code fails to produce results for second derivative.
i clearly mentioned it’s just not about you editing the code, you are missing to interlink the header under which it was explained about why first tape gradient comes to None.
The lab clearly mentioned first gradient to be outside the loop, and when you edited and mentioned tape gradient doesn’t comes to None, you mentioned first gradient inside loop.
Codes explains correctly as first gradient calculation need to outside the loop, resulting in tape gradient to be none for d2y_dx2. Code explanation is based on if the first gradient calculation is outside the loop for those two calculations
I’m saying that the explanation is wrong because expiration of the tape has nothing to do with why d2y_dx2 evaluates to None. Calling tape_1.gradient(y, x) does result in expiration of tape_1. But this statement is true even if the call to tape_1.gradient(y, x) is moved inside runtime context for tape_2.
What’s important is that tape_1.gradient(y, x) is being executed outside the runtime context of tape_2 so the operations performed during tape_1.gradient(y, x) is not recorded by tape_2.
This is incorrect. If you see the assignment section where you are pointing d2y_dx2 comes to None has even first gradient calculation, i.e. dy_dx calculation also needs be outside of outer with block as per the header instruction. But since you changed the indentation for dy_dx, you have got the shape output.
check the image and compare with created post message you will see the difference of what assignment header instructions mentions and what you are not following when you made that indentation edition inside the block for dy_dx (tape 1 gradient)
@classical_leap, I see what you’re saying. The comments in the assignment are misleading. It’s true what the comments say that the tape_2.gradient(dy_dx, x) calculation will not calculate what we want if the tape_1.gradient(y,x) is outside the outer with block (the tape_2 with block), but the reason this is true has nothing to do with tape_1 expiring. It has to do with the scope of the with block for tape_2. tape_2 is only going to “watch” operations that happen within the scope of its with block, so it won’t “see” the dy_dx calculation if it’s outside that scope.
Good catch and thank you for persevering on this. I can submit a request to staff to change the wording to be more clear.