I understand how normalization would affect accuracy when dealing with multiple features that have different scales, i.e. “millimeters between planets” vs “number of times a person gets married”.

I can also see how using different types of normalization could de-emphasize outliers. I don’t understand what is happening in this example though. We are simply rescaling 0-255 to 0-1. The relative scale is still exactly the same. What is the exact mechanism that is causing the huge difference in accuracy between un-normalized and normalized? Is it something in the particular optimizer or loss function (“adam”, “sparse_categorical_crossentropy”) that works better with values between 0 and 1 than arbitrary ranges?

Expanding on your reply.
Even though the sigmoid function technically returns different values for 0 < x < 255, in practice any x over 36 overflows the significant digits of float and returns 1. Values for 0 < x < 1 however don’t have this problem.

import tensorflow as tf
import numpy as np
import math
def get_sigmoid(x):
return 1 / (1 + math.exp(-x))
def show_floats():
print_sigmoids([f / 100.0 for f in range(100)])
def show_ints():
print_sigmoids([float(i) for i in range(255)])
def print_sigmoids(vals):
for i in vals:
print(f"val: {i}, sigmoid: {get_sigmoid(i)}")
show_ints()
print('-------------------')
show_floats()

Even if we did have more significant digits in floats, the range of softmax return values for numbers over 36 gets incredibly small (and smaller still as we increase x). Values between 0 and 1 provide much more differentiation:

Note: in this exercise, we are not actually using the sigmoid function, but the softmax. However it is basically the same idea since softmax is:

mnist was used since exercise 1. But, I agree with you that there is inconsistency in the notebook. There’s no point in exploring fashion mnist and using mnist for rest of the exercises. I’ve asked the staff to look into this.

I just ran your notebook. This is the grader feedback:

Failed test case: model trained for more than 8 epochs. The callback should have fired by now..
Expected:
a maximum of 8 epochs,
but got:
10.

There’s a difference between the above feedback and what you’ve provided.
The one above means that the callback should’ve fired before the 9th epoch. This asks you to tune the model architecture.

The feedback you shared means that the model was trained for 8 epochs instead of 10.

Please do 2 things:

Update your model architecture to trigger the callback before the 9th epoch. Leave the number of training epochs to 10.

If the feedback you got it different from what I shared now, reply with the lab ID. I’ll ask the staff to look into the grader.

Failed test case: model was not originally set to train for 10 epochs.
Expected:
10,
but got:
9.

Failed test case: model trained for more than 8 epochs. The callback should have fired by now…
Expected:
a maximum of 8 epochs,
but got:
9.
Lab I’d= vtfbrvmn

Please click my name and message your notebook as an attachment along with screenshot of expanded grader feedback. I’ll forward them to the staff to look at.