I experienced with the following code to see if we can use just one tape here:
x = tf.Variable(1.0)
with tf.GradientTape(persistent=True) as tape_2:
y = x * x * x
dy_dx = tape_2.gradient(y, x)
d2y_dx2 = tape_2.gradient(dy_dx, x)
print(dy_dx)
print(d2y_dx2)
I got the right ouput but with a warning:
WARNING:tensorflow:Calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
Question: As our goal is that the tradient inside the context will be recorded in this case, for calculating the 2nd gradient later, is it a good pratice to use just one tape in this way, or it’s really better to use 2 tapes as we learned and avoid the warning?