During training, the input x is the original image(0), and the predicted noise is the noise(t) added to this original image(0) at step-t. Correct ?
in other words, the input x at step-t to the Net (during training) is not image(t-1) , correct ?