Input "x" to Net during training : is it the original image(0) or noised image(t)?

During training, the input x is the original image(0), and the predicted noise is the noise(t) added to this original image(0) at step-t. Correct ?

in other words, the input x at step-t to the Net (during training) is not image(t-1) , correct ?