I am in the middle of “Programming Assignment: Linear Transformations and Neural Networks” and puzzled by one statement and the approach it takes.
The section “3.1 - Linear Regression” seems to describe a vanilla ordinary least squares (OLS) problem, with L2 loss (“cost”) function.
Then, however, section 3.1 continues, “The next step is to adjust the weights and bias, in order to minimize the cost function. This process is called backward propagation and is done iteratively: you update the parameters with a small change and repeat the process.”
Why, in a linear algebra course, is simple OLS being solved numerically (via back propagation)? There is a simple closed form (analytical) solution using matrix math to find the weights that minimize the residual sum of squares (cost function).
It is a good point that there is a closed form analytic solution to Linear Regression. They should have mentioned that and added a bit more explanation. There are a couple of reasons that they might have chosen to show an iterative solution:
The Normal Equation may be simple to write down, but its computational complexity is higher than using Gradient Descent (basically O(n^3), although there are ways to reduce it). So there are cases in which it is more efficient to use iterative solutions if the number of parameters is large. I have not taken the new MLS courses, but when Prof Ng covered Linear Regression in the original Stanford ML course, he did cover both the Normal Equation and Gradient Descent and made a point of mentioning that GD is preferred for large cases. If memory serves, I think he suggested n < 10000 as the threshold for using the closed form solution, but it’s been 6 or 7 years since I watched those lectures so I don’t guarantee that I’m remembering that correctly.
The point of these courses is to give you the math background for Machine Learning and Deep Learning. Once you get beyond Linear Regression, there are no more ML cases that I’m aware of in which there are closed form solutions. Once we get beyond Logistic Regression, we don’t even have convex cost functions, so everything depends on approximate iterative solutions based on Gradient Descent. So perhaps their intent is to use this as an introduction to the Gradient Descent based solutions to get you into that mindset.