Why training is to predict noise compared to clean image directly, but generating is step by step?

jongpt6 · March 6, 2024, 1:12pm

When training the diffusion neural network ( U-net), it takes each one of those clean images, and add random time-step together with time-step-related noise, and train the model to predict that noise.
Therefore, after it is trained, whenever feed the model a noisy image together with its related time-step info, the model should be able to output the clean image with one step, because that is how its trained, predict that noise then get rid of the noise.

But when use the model to generate new images, it has to go through 500 or so time-steps, so-called reverse the diffusion process, step by step, remove a bit noise to a bit cleaner state, why is that? why it is not directly go to the clean state?

Could anyone help explain?

zxcheng95 · December 28, 2024, 7:18am

Training here is to predict the noise added in a certain step-t.

So for generation, to get the image (step-0), you need to revert from the random Gaussian distribution back, you need to incrementally call the network (predict the noise at certain step-t) and remove this predicted noise. And here you need to be cautious to not just minus noise but strictly follow the mathematic equation to mimic the reverse process of adding the noise.

Topic		Replies	Views
Question on noise prediction How Diffusion Models Work	5	359	June 9, 2023
Input "x" to Net during training : is it the original image(0) or noised image(t)? How Diffusion Models Work	0	91	February 22, 2024
A basic question How Diffusion Models Work	3	191	September 28, 2023
Question_Regarding_Training_process How Diffusion Models Work	0	14	December 18, 2024
Reasons for adding extra noise to the training data How Diffusion Models Work	3	400	June 9, 2023

Why training is to predict noise compared to clean image directly, but generating is step by step?

Related topics