Interesting finding

When the instructor told us about why noise should be added between timesteps I thought instead of adding noise to keep it in a normal distribution, why cant you just map each value or (round) each value to their respective normal distribution values. Well I tried it. (this is in the sampling notebook by the way and I just rewrote the denoise_add_noise function to what is there below). and I arguably got better result through this approach(remember that the noise adding step is still there, i will reply to this thread with my experience without the step shortly)

def denoise_add_noise(x, t, pred_noise, z=None):
if z is None:
z = torch.randn_like(x)
noise = b_t.sqrt()[t] * z

# Standardize pred_noise to a standard normal distribution
pred_noise = (pred_noise - torch.mean(pred_noise)) / torch.std(pred_noise)

# Round pred_noise
pred_noise = torch.round(pred_noise)

mean = (x - pred_noise * ((1 - a_t[t]) / (1 - ab_t[t]).sqrt())) / a_t[t].sqrt()
return mean + noise

Here are the results(image one: standard, image two is with my modification). The results I got are arguably more aesthetic although I do admit are a little less varied.