By default GradientTape watches every tensor within its context, am I correct?
If so, what is the purpose of GradientTape.watch? Does it mean it will only keep record of the variable specified within watch across iterations of forward pass.
For example, in the attached image below - tape will only keep track of x but NOT y or z? What are the pro/con of using watch?
