Confused over number of iterations and number of weight updates

FYI, that’s really an inefficient way to look at what should be considered a vector algebra task. You don’t need to compute each weight separately, you can do that all in one line of code. Then you don’t need that for-loop at all.

This sort of “vectorization” will be covered later in the course.

i don’t know if you have taken Deep Learning specialisation. Machine learning doesn’t just involves computing weight. it involves neural network where each neuron represents a unit where all these parameters are passed through in a different ways in model training.

You will understand these part of deep learning more in DLS specialisation.

weight is just not about update of gradient weight, but also to achieve minimised cost function which hold significance in relative to dataset, its features and other methods.

as mentor mentioned you, you will have this understanding in further courses.

So for K iterations of the for-loop, there will be 4 * K weight parameter updates to w_0, w_1, w_2 and w_3 in total and K bias b updates?

Correct?

I am curious why every one of your messages asks the reader to agree with your statement?

It depends on what you define as an ‘update’.

The way that scikit-learn seems to count “weight updates”, they only are counting the update cycles, regardless of the number of weights.

Notice that in my reply here from 13 hours ago, the number of weights did not matter, in the way the scikit-learn reports them.

1 Like

Again, I want to re-state that I fear you are focusing on a minor point that is not significant in the larger scope of the machine learning introduction course.

I think that counting the weight updates is not worthy of this much energy expenditure.

Please can you say if you agree with my last conclusion?

Thank you.

I have nothing new to add to this thread. Perhaps another mentor or community member can take up the discussion.

After reading through the comments I share the sentiment as @TMosh. It’s sufficient to know the big O notatation of the algorithm (in some rare cases it’s good to know the constants that precede the mathematical terms like polynomials, exponentials, logarithms, …), but not to the extent of insignificant constants. It’s good enough to represent the complexity in terms of big O of the number of parameters, number of samples and number of iterations.

Thank you for your reply.

However, I still do not understand why the number of weight updates is over 12,000 using scikit-learn but the number of iterations is 124 and the number of weight parameters is 4. There is only ONE weight update per iteration from the Python code which implements the following pseudo-code for just one weight parameter;

w_0 = w_0 - alpha * gradient_function(w, X, b, Y)

So there are only 4 * 124 weight updates in total or 496 weight updates.