Question on noise prediction

zyzhang1130 · June 2, 2023, 3:14am

I don’t understand how a NN can be trained to predict noise, because noise is random and isn’t supposed to exhibit some learnable features. Can anyone help to explain? Thanks.

reinoudbosch · June 3, 2023, 6:54am

Hi zyzhang1130,

Here’s my two cents.

The NN is calibrated starting with pictures to which noise is added in subsequent steps. In other words, the parameters in the model are calibrated to predict a picture with some added noise, taking the picture as input and the picture with added noise as output. This is then iterated: the picture with a bit of noise now becomes the input and the picture with some more noise the output and the parameters are again calibrated. When constructing a picture from noise the parameters are fixed and the process is reversed. This has a particular direction: from the random noise, a step is made in the direction of the picture the parameters were calibrated on. To arrive at a different picture, a bit of noise is added.

So in the very end the noise is random, but the step from random noise to a bit less random noise is directed by the picture the system was calibrated on.

zyzhang1130 · June 9, 2023, 6:50am

after watching some other diffusion model-related videos, I feel it is better to phrase it as predicting the image instead?

reinoudbosch · June 9, 2023, 9:12am

Hi zyzhang1130,

That makes sense to me. In being calibrated on how to get from an image with a certain amount of noise to an image with a bit more noise, the system is predicting the image with a bit more noise.

zyzhang1130 · June 9, 2023, 10:14am

although the way it is explained here is predicting noise first then doing subtraction to get the image. I wonder why this extra step is needed if we can directly predict a less noisy version of the image (I’m referring to the denoising process)

reinoudbosch · June 9, 2023, 10:33am

Well, you can subtract the values of the image with more noise from the image with a bit less noise, thereby predicting the noise \epsilon that is added to, at the end, obtain a noisy image with a Gaussian distribution. In the denoising process, this predicted \epsilon can then be used to distill images from Gaussian distributed noise by subtracting noise. This is how I understand the presentation in original paper (e.g. p. 4, p. 8).

Topic		Replies	Views
Why training is to predict noise compared to clean image directly, but generating is step by step? How Diffusion Models Work	1	319	December 28, 2024
A basic question How Diffusion Models Work	3	186	September 28, 2023
Input "x" to Net during training : is it the original image(0) or noised image(t)? How Diffusion Models Work	0	91	February 22, 2024
Question_Regarding_Training_process How Diffusion Models Work	0	12	December 18, 2024
Reasons for adding extra noise to the training data How Diffusion Models Work	3	362	June 9, 2023

Question on noise prediction

Related topics