You might have thought 19, right?

The explanation for why the model prediction is a bit off in C1_W1_Lab_1_hello_world_nn.ipynb is not correct in my opinion.

In more complex, real-world cases, I can follow the argument of probabilities that is given. However, this toy example performs a linear regression on a noise-free data set with an inherent linear relationship. It is stated that 6 datapoints do no suffice to find the exact relationship, while in this case only 2 datapoints would actually do!

The reason why the predicition is somewhat off, is that the solver has not yet fully converged. Though the problem is very simple, the nn-solver takes its usual baby steps towards some minimum. Increasing the learning rate improves the prediction significantly . This can be done by changing the code as follows:

# Compile the model
optimizer = tf.keras.optimizers.SGD(learning_rate=0.05)
model.compile(optimizer=optimizer, loss=‘mean_squared_error’)

I then get following output for the prediction:

1/1 [==============================] - 0s 154ms/step

In fact, there exists an analytical solution for the optimum, as explained here as well as in Andrew Ng’s machine learning course. Applying this, would yield the exact prediction of 19 (effects of finite machine precision set aside).

Gradient descent on mini batches works well. It usually requires fewer compute resources than the analytical method and scales well to larger datasets. With proper tuning, performance can get close to optimal for a wide variety of problems.