Question about Lab 3 read_image_tfds()

I have some confusion and hope someone can help me to figure out:

  • Why do we choose numbers (0, 48) from a uniform distribution to find a random number?
  • Why do we divide [x|y]max and min by 75?
  • Can we divide xmin and ymin by 75 at line 7 and 8, after an image’s normalization step?
def read_image_tfds(image, label):
    xmin = tf.random.uniform((), 0 , 48, dtype=tf.int32)
    ymin = tf.random.uniform((), 0 , 48, dtype=tf.int32)
    image = tf.reshape(image, (28,28,1,))
    image = tf.image.pad_to_bounding_box(image, ymin, xmin, 75, 75)
    image = tf.cast(image, tf.float32)/255.0
    xmin = tf.cast(xmin, tf.float32)
    ymin = tf.cast(ymin, tf.float32)
    xmax = (xmin + 28) / 75
    ymax = (ymin + 28) / 75
    xmin = xmin / 75
    ymin = ymin / 75
    return image, (tf.one_hot(label, 10), [xmin, ymin, xmax, ymax])

I’d appreciate your help. Many thanks :slight_smile:

Hi, please find my reply below:

1 Like

Hi @jackliu333,

Thanks a lot for your response; I’d appreciate it.

I can get the idea of why we choose 0 and 48. We add 47 to each x and y, and since tf.random.uniform accepts the distribution in the range [minval, maxval), the maxval should be 48, and then we can get the max 47.

Regarding the decision by 75, I still don’t get it totally. As far as I understand, each pixel’s value stays in range (0, 254), so we divide it by 255 to normalize the value, and we apply this division to all pixels despite an image’s width and height.

Here, [x|y][min|max] is simply a coordinate of one random point on the XY coordinate system:

  1. The reason we normalize them is that it will be easier to feed NN later.
  2. And the range of the new canvas’ X-axis and Y-axis is (0,74), we will divide them by 75. If there is another image with a different size like (16, 32), we will divide x by 32 and y by 16.

Am I correct?

In addition, I would like to ask why we need to subtract an image by 1 (line 11), after the normalization step. Is it a compulsory or optional step?

Thanks again :slight_smile:

I think your understanding is correct.

On the minus 1 operation, it follows the standard transformation that consists of scaling + centering: first we scale the image by dividing against 127.5, then center the image by minusing 1. The resulting range would be different from say, just dividing by 255, which is more often.

1 Like