What do the input shapes represent in the initializers?

Recall the build() function in lab 3:

    def build(self, input_shape):
        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(name="kernel",
            initial_value=w_init(shape=(input_shape[-1], self.units),
                                 dtype='float32'),
            trainable=True)
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(name="bias",
            initial_value=b_init(shape=(self.units,), dtype='float32'),
            trainable=True)
        super().build(input_shape)

What does the shape parameter in w_init() and b_init() represent? The videos glossed over this. It’s all the more confusing that the documentation for tf.random_normal_initializer() and tf.zeros_initializer() makes no mention of a shape argument.

@Amine.L
The shape parameter represents the shape of the weight matrix connecting any two consecutive layers in the neural network.
For example, consider you have an input having shape 32x20 (batch_size, num_features). You pass this input shape to the build method. Now to define a layer you need to initialize weights and biases for this layer. Depending upon the number of neurons (self.units) that you want to have in this layers, the shape of weight matrix will be defined.
Suppose you want 64 neurons in this layers, then the shape of weight matrix will be 20x64 and bias will be a vector of shape 64x1.

Hope this clarifies your doubt.

1 Like

Thanks, so if I understand correctly, with the numbers in your example, we’d have input_shape = (32, 20) and self.units = 64 such that X.shape = (32, 20), W.shape = (20, 64) and b.shape = (64, 1)? Is that correct? If so, then what I find confusing is that X*W + b would then be (32x20)*(20x64) + (64x1) = 32x64 + 64x1. Is that problematic to have to add a 32x64 matrix to a 64x1 vector?

The bias will be a vector with 64 values and its shape will be (64, ). This one-dimensional bias vector will broadcast across each row of X*W.

Earlier I mentioned the shape of the bias vector as 64x1. Apologies for the confusion created.
There is a difference between the shapes 64x1 and (64, ). When you have an array/tensor of shape 64x1, it will be indexed by two indices while the array/tensor with shape (64, ) will be indexed by a single index.

1 Like