Well, it’s just a question of how you look at it or how you define it. Each neuron gets all the inputs and produces its own output. If you want to call that 3 models or 1 model, I guess it’s up to you, but Prof Ng calls it one model.
If we have 3 features and we have 3 neurons, then we have a total of 12 weight and bias values, right? The weights are a 3 x 3 matrix and the bias values are a 3 x 1 column vector. Now I should say that I don’t know the MLS course material, only the DLS course material, which presents a more advanced version of the same material. I’m not sure how Prof Ng represents the data in MLS for this case, but evaluating the output for this layer is the following operation:
W \cdot X + b
Where X is either a 3 x 1 column vector representing one sample or it’s a 3 x m matrix representing m input samples. Then W is a 3 x 3 matrix and b is a 3 x 1 column vector. The output will then be 3 x 1 if X is one sample or 3 x m where X is m samples.
Here again I don’t now whether Prof Ng uses math notation with indexes starting at 1 or python notation with 0-based indexing. You’ve used 0-based for X, so let’s go with that. Here’s what W looks like:
[[w_{0,0}, w_{0,1}, w_{0,2}],
[w_{1,0}, w_{1,1}, w_{1,2}],
[w_{2,0}, w_{2,1}, w_{2,2}]]
Let’s go with just x as one sample, so it’s
[[x_0],
[x_1],
[x_2]]
The output will be 3 x 1 and the first element of it is:
z_0 = \displaystyle \sum_{i = 0}^2 w_{0,i} * x_i + b_0
And so forth for z_1 and z_2. So if we start with all the w_{i,j} values randomly chosen and we run gradient descent, they will all stay different at least in principle. You can’t prove that some of them couldn’t possibly end up the same, but there is no reason that would drive them to be the same. Back propagation is driven by the cost function comparing the output of the predictions with the labels, right?
Also note that I’ve only showed the “linear” portion of the calculation at the first layer of the network. We then apply a non-linear activation function and then feed that output as the input to the second layer of the network, which will have the same structure, but not necessarily have the same number of output neurons.