Hello,
I am learning about the TensorFlow library for the first time through this course. Below is the code provided in the optional lab that creates a tensorflow model:

tf.random.set_seed(1234) # applied to achieve consistent results
model = Sequential(
[
tf.keras.Input(shape=(2,)),
Dense(3, activation='sigmoid', name = 'layer1'),
Dense(1, activation='sigmoid', name = 'layer2')
]
)

And then a few blocks down in the notebook, the values of w and b generated by the model are printed: Input:

Itâs important to note that the values of the weights and biases (W1, W2, b1, and b2) are initialized before any input data is passed through the model. This initialization process is completely independent of the input data and the values you see in the weights are purely the result of this random initialization. The shape and values of these weights depend on the number of neurons and the input dimensions to the layer. Since the Dense layer connects each input node to each output node, the weight matrix will have dimensions corresponding to these sizes. In layer 1, the input size is 2 and the output size is 3, so the weight matrix W1 has a shape of (2, 3). The biases are vectors with dimensions equal to the number of neurons in the layer, so b1 has a shape of (3,). For layer 2, the input size is 3 and the output size is 1, so the weight matrix W2 has the shape (3, 1), and b2 has the shape (1,). When the model is instantiated, TensorFlow automatically initializes these weights using a random initialization strategy. The seed (tf.random.set_seed(1234)) ensures that the random numbers generated are the same every time, so you get consistent results.

I see, so itâs just randomly initialized then. Is there a reason that TensorFlow does this? I canât imagine that a neural network made with randomly-generated parameters could be useful to anybody.

Perhaps it is just meant to create space in memory to store those values?

At first, parameters are initialized randomly but then with training, they are updated according to the loss function. You will see this, from scratch, in the week 4 assignments of this course.

The randomly initialized network is not the one you actually use. Itâs just the way you start the training.

The reason that random initialization is used is for âSymmetry Breakingâ: if you donât do that, then all the neurons learn the same thing during training. Hereâs a thread which discusses this point in more detail.

Random Initialization is crucial for breaking symmetry, allowing the neurons to learn different features and providing a starting point for learning. The real âlearningâ happens during training when the model adjusts these weights to minimize the loss function based on the input data.

The weights and biases are stored in memory to be adjusted during training. As the network trains and the weights are updated, this space is continuously reused and adjusted.

The random values are not meant to be useful on their own but are refined into useful parameters during training. Through forward and backward passes, the model adjusts these weights using gradient descent or other optimization techniques. Over many iterations, the weights converge to values that minimize the loss function, transforming the initially random network into one that makes meaningful predictions.

So, while a neural network with only randomly initialized parameters isnât useful by itself, it becomes powerful as it learns and updates these parameters based on the data itâs given.

Yes, just to add to @paulinpaloaltoâs thought, in my mind I imagine us scattering dice all over a table, and then you âpullâ the model in based on the random face counts of the dice to find the relationship (and that, really, depends on the question you are asking). He might think this idea is weird, but it has helped me think about these things.

But if all the dice were just blankâŚ Where do you start from ? You are trying to minimize from nothing.

I appreciate your analogy with dice, which illustrates the importance of random initialization and underscores the potential drawbacks of starting with ânothingâ or uniform values. If the dice were blank (like initializing all weights to zero), you wouldnât have a starting point. Every die would look the same, meaning every neuron in your network would start the same and learn the same things, leading to a lack of diversity in the learned features. Thereâd be no variance to explore, and no difference in the âpathsâ the network can take as it tries to minimize loss. Random initialization gives you different starting points (dice faces) and ensures the model can explore multiple paths during training. Without this variety, the network would struggle to learn effectively.

Yes, you can prove mathematically that if you start with weights that are all identical (either zero or some other constant value), then all neurons will output the same values and the gradients will also be the same. So what you end up with is the equivalent of one neuron per layer: all the neurons in each layer learn the exact same weights through training. So the neural network you get by that method is not very powerful and expressive.

You donât even have to go through the math: just try it. Initialize all your weights and bias values to zero and then run your training for 100 iterations. Then print your W^{[l]} and b^{[l]} values and youâll see that all the rows are equal at each layer. Not very useful.