Nitpicks for assignment of week 3

Point 1

We read:

Since TensorFlow Datasets are generators, you can’t access directly the contents unless you iterate over them in a for loop, or by explicitly creating a Python iterator using iter and consuming its elements using next. Also, you can inspect the shape and dtype of each element using the element_spec attribute.

That should probably be “you can inspect the shape and dtype of all elements” because what happens is that we interrogate the Tensorflow dataset:

# x_train is a tf.data.Dataset
print(x_train.element_spec)

which gives us the spec of all the Tensors in the Dataset

TensorSpec(shape=(64, 64, 3), dtype=tf.uint8, name=None)

Point 2

There is a function called normalize(image) (image is a Tensor) which flattens the tensor in a shape (p,1) and transforms the RGB values in 0-255 to values in [0,1].
This function should really be called preprocess_single_image as “normalization” has a specific meaning now (as given in the course) and what is being done here doesn’t even look like normalization.

Here it is, a bit modified.

def preprocess_single_image(image: Tensor) -> Tensor:
    result = tf.cast(image, tf.float32) / 255.0
    result = tf.reshape(result, [-1, ])
    print(f"Preprocessed single image in: type = {type(image).__name__}: shape = {image.shape}, dtype = {image.dtype}")
    print(f"Preprocessed single image out: type = {type(result).__name__}: shape = {result.shape}, dtype = {result.dtype}")
    return result

And then it is called like this:

def preprocess_dataset(dset: tf.data.Dataset, dset_name: str) -> tf.data.Dataset:
    # https://www.tensorflow.org/api_docs/python/tf/data/Dataset
    result = dset.map(preprocess_single_image)
    # No *actual* preprocessing has happened yet
    print(f"Result of preprocessing dataset '{dset_name}': class = {type(result)}, spec = {result.element_spec} ")
    return result

One may note that it isn’t being used immediately by tensorflow, it will apparently called “on need” and is put (for now) into an evaluable data structure by dset.map(), very interesting.

Point 3

In graded functions

def linear_function():

and

def initialize_parameters():

we are filling the b vectors with random numbers from the normal distribution. However, in the course it was said “don’t bother with that, let them remain at 0”. So is, this really appropriate?

Thanks for your always careful reading of everything, but I would agree with your description of these as “nitpicks”.

For point 2, you’re right at some level, but I’m pretty sure Prof Ng has used the term “normalization” for this more simplistic rescaling of the pixel values in the past.

For point 3, of course symmetry breaking is required, but there is no harm in also using non-zero random values for the bias values. It turns out that you have three choices: random W and zero b values, zero W and random b values or random for both. Any of those combinations break symmetry and allow the training to learn. I have not done or seen anyone mention any experiments to determine whether any of those strategies is more advantageous in general. But given that there is no one “magic bullet” initialization algorithm that works best in all cases, my guess would be that the same applies to the question of selecting between the 3 strategies. So what they have done here is not incorrect, it’s just different than what we have done up to this point. Maybe it would be worth adding a comment, so that it’s not just something different with no explanation of why it was done that way.

3 Likes

For point 2, sometimes I see people call it as min-max normalization. While the formula is usually to subtract each value by the minimum over all values and then divide by the range of the values so that the normalized values are in the range of [0, 1], the same goal of [0, 1] may also be achieved by replacing the subtrahend and the divsor by selected values such as, in the case of pixel values, 0 and 255 respectively.

Sometimes people call it as rescaling.

4 Likes