C1_W2_Lab1 Exercise 7. Why does normalization change the results so much?

I understand how normalization would affect accuracy when dealing with multiple features that have different scales, i.e. “millimeters between planets” vs “number of times a person gets married”.

I can also see how using different types of normalization could de-emphasize outliers. I don’t understand what is happening in this example though. We are simply rescaling 0-255 to 0-1. The relative scale is still exactly the same. What is the exact mechanism that is causing the huge difference in accuracy between un-normalized and normalized? Is it something in the particular optimizer or loss function (“adam”, “sparse_categorical_crossentropy”) that works better with values between 0 and 1 than arbitrary ranges?

Please try the following :

  1. Define a function get_sigmoid(pixel_value, weight) where you return the sigmoid for the pixel_value * weight value.
  2. Create a bunch of weights using np.linspace between 0 and 1e-5.
  3. See how sigmoid varies when pixel value is 255 and 1.

This should tell you which scale of values will help speed up learning keeping backpropagation in mind.

Have you seen this?

2 Likes

Excellent reply @balaji.ambresh! Thank you.

Expanding on your reply.
Even though the sigmoid function technically returns different values for 0 < x < 255, in practice any x over 36 overflows the significant digits of float and returns 1. Values for 0 < x < 1 however don’t have this problem.

import tensorflow as tf
import numpy as np
import math

def get_sigmoid(x):
    return 1 / (1 + math.exp(-x))

def show_floats():
    print_sigmoids([f / 100.0 for f in range(100)])
    
def show_ints():
    print_sigmoids([float(i) for i in range(255)])

def print_sigmoids(vals):
    for i in vals:
        print(f"val: {i}, sigmoid: {get_sigmoid(i)}")
        
show_ints()
print('-------------------')
show_floats()

For 0 < x < 255:

val: 0.0, sigmoid: 0.5
val: 1.0, sigmoid: 0.7310585786300049
val: 2.0, sigmoid: 0.8807970779778823
val: 3.0, sigmoid: 0.9525741268224334
...
val: 36.0, sigmoid: 0.9999999999999998
val: 37.0, sigmoid: 1.0
val: 38.0, sigmoid: 1.0
val: 39.0, sigmoid: 1.0
...

For 0 < x < 1:

val: 0.0, sigmoid: 0.5
val: 0.01, sigmoid: 0.5024999791668749
val: 0.02, sigmoid: 0.5049998333399998
val: 0.03, sigmoid: 0.5074994375506203
...
val: 0.97, sigmoid: 0.7251194977898231
val: 0.98, sigmoid: 0.7271082163411295
val: 0.99, sigmoid: 0.7290879223493065

Even if we did have more significant digits in floats, the range of softmax return values for numbers over 36 gets incredibly small (and smaller still as we increase x). Values between 0 and 1 provide much more differentiation:
sigmoid_graph

Note: in this exercise, we are not actually using the sigmoid function, but the softmax. However it is basically the same idea since softmax is:

softmax

Softmax

For some reason Exercise 7 was grabbing the wrong dataset. Instead of:
tf.keras.datasets.fashion_mnist
it was pointing to:
tf.keras.datasets.mnist

Just something to be aware of if the accuracy went up when you removed the normalization.

Thanks for pointing this out.

mnist was used since exercise 1. But, I agree with you that there is inconsistency in the notebook. There’s no point in exploring fashion mnist and using mnist for rest of the exercises. I’ve asked the staff to look into this.

Failed test case: model was not originally set to train for 10 epochs.
Expected:
10,
but got:
8.
Pls help

@Ritwik_Sarkar

This is the comment in the starter code:

# Fit the model for 10 epochs adding the callbacks
# and save the training history

Why would you train the model for 8 epochs?

Uploading: Screenshot_2022-09-05-17-53-20.png…

[code removed - moderator]

@Ritwik_Sarkar
Please click my name and message your notebook as an attachment.

Uploading: C1W2_Assignment_12.ipynb…

[code removed - moderator]

@Ritwik_Sarkar

Stop posting your code in public. It’s ok to send code via direct message to a mentor.

@Ritwik_Sarkar

Are you still seeing the same error?

Failed test case: model was not originally set to train for 10 epochs.
Expected:
10,
but got:
8.

Yes same to same show

I just ran your notebook. This is the grader feedback:

Failed test case: model trained for more than 8 epochs. The callback should have fired by now..
Expected:
a maximum of 8 epochs,
but got:
10.

There’s a difference between the above feedback and what you’ve provided.
The one above means that the callback should’ve fired before the 9th epoch. This asks you to tune the model architecture.

The feedback you shared means that the model was trained for 8 epochs instead of 10.

Please do 2 things:

  1. Update your model architecture to trigger the callback before the 9th epoch. Leave the number of training epochs to 10.
  2. If the feedback you got it different from what I shared now, reply with the lab ID. I’ll ask the staff to look into the grader.

Failed test case: model was not originally set to train for 10 epochs.
Expected:
10,
but got:
9.

Failed test case: model trained for more than 8 epochs. The callback should have fired by now…
Expected:
a maximum of 8 epochs,
but got:
9.
Lab I’d= vtfbrvmn

Please click my name and message your notebook as an attachment along with screenshot of expanded grader feedback. I’ll forward them to the staff to look at.

@Ritwik_Sarkar

Don’t forget to fix the typo: stop_traning

@Ritwik_Sarkar
You are training the model for 9 epochs. That’s incorrect.
See this:

I don’t understand
I am already try 10 epochs