By default GradientTape
watches every tensor within its context, am I correct?
If so, what is the purpose of GradientTape.watch
? Does it mean it will only keep record of the variable specified within watch across iterations of forward pass.
For example, in the attached image below - tape will only keep track of x
but NOT y
or z
? What are the pro/con of using watch?