In Week2, the video lecture titled “Row Echelon Form”, Luis Serrano mentions that the Rank of a matrix is the number of 1s in the diagonal of the matrix in its row echelon form.
Then also in Week2, the video lecture titled “Row Echelon form in General” Luis Serrano mentions that the Rank of a matrix when in row echelon form is the number of pivots. If you take the matrix on the right side in the image from this lecture, and divide the first row by 3, the rank from method 1 (# of 1s in the diagonal, which is 1) is different from method 2 (# of pivots, which is 3).
Which method is correct? Why are these two methods in conflict when it comes to calculating the rank?
Lastly, in the second image I pasted, the matrices are not strictly in Row Echelon form, since the pivot values are not reduced to 1s. That is another misleading piece of information for someone trying to master the basics…
That is a disappointing response in support of a paid course.
Your response is: “…Computing the rank of a matrix has little or nothing to do with Machine Learning.”
Is it?
The rank of a matrix plays an important role in several aspects of machine learning, particularly in areas related to linear algebra and dimensionality reduction. Here are some ways in which the rank of a matrix is relevant in machine learning:
Linear Regression: In linear regression, the design matrix (the matrix containing the feature vectors) must have full column rank for the ordinary least squares solution to exist and be unique. If the design matrix has a rank deficiency, it means that some of the features are linearly dependent on others, leading to multicollinearity issues. This can cause instability and unreliable parameter estimates.
Principal Component Analysis (PCA): PCA is a widely used dimensionality reduction technique in machine learning. The rank of the data matrix (the matrix containing the data points) determines the maximum number of principal components that can be extracted. If the rank of the data matrix is r, then at most r principal components can be obtained, and the remaining principal components will have zero variance.
Singular Value Decomposition (SVD): SVD is a matrix factorization technique used in various machine learning applications, such as recommender systems, image compression, and text mining. The rank of a matrix is equal to the number of non-zero singular values in its SVD decomposition. SVD is often used for dimensionality reduction by truncating the smallest singular values, which correspond to the directions of least variance in the data.
Pseudoinverse: The pseudoinverse (or Moore-Penrose inverse) of a matrix is a crucial concept in machine learning and is used for solving overdetermined or underdetermined linear systems. The pseudoinverse exists only for matrices with full column rank or full row rank, depending on the application.
Regularization: Techniques like Ridge Regression and Lasso Regression involve adding regularization terms to the cost function to prevent overfitting. The rank of the design matrix (after regularization) determines the effective degrees of freedom in the model and plays a role in model complexity and generalization performance.
Matrix Factorization: Several machine learning algorithms, such as collaborative filtering for recommender systems and non-negative matrix factorization for topic modeling, involve factorizing a matrix into low-rank approximations. The rank of the approximation determines the complexity and interpretability of the learned factors.
In general, the rank of a matrix provides information about the linear independence of its rows or columns, which is crucial for addressing issues like multicollinearity, dimensionality reduction, and model complexity in machine learning applications.
I think the general idea is that the course video lectures need correction, and not that you thank me for my reply as a shallow response. I hope I get to genuinely thank you for something at some point…
None taken. I also apologize if I didn’t express myself appropriately. My point was that a course as foundational as this one should iteratively be corrected based on feedback. A student having a shaky foundation has very little chance at success with a subject as complex as Machine Learning. Thank you.
@Hartej_Dhiman so what other materials/ resources do you recommend for ML?
I am beginner here.
Also while googling the “Normal Equation” described by Andrew NG, I found this article which contradicts with his thoughts about Gradient Descent for linear regression.
I need alternative resources if you have to be more updated to follow as a beginner, Thanks @Hartej_Dhiman
The “pivot” explanation can apply to both cases. For the first case, all pivots are in the diagonal.
This may be a bit less trivial but better if they have covered “row operation” or “row swapping” in the lectures. If we swap some rows so the 2nd row becomes the 3rd and the 3rd becomes the 4th, then the “diagonal” explanation becomes valid, too. (Note we can always make the leading values 1s easily)
Row swapping gives you a set of equivalent equations, so it does not change the problem being solved. However, it no longer looks like a “échelon” (french word for ladder).
Seems that “pivot” is the “more general term”, but “diagonal” is understandable.
Cheers,
Raymond
PS: I am just trying to bring up something that may be useful for making sense of these materials.
Yep, you caught my mistake (which was basically a typo; RREF/REF). In the second image I pasted, the matrices are surely in Row Echelon form and not RREF, but my point about the two conflicting pieces of deducing Rank is still valid. It is interesting that you focused on my mistake instead of Serrano’s…but that is a pattern here.
There is no mistake in these lectures. All matrices in row-echelon form have pivots. In the case of reduced row-echelon form, the pivots just happen to be 1’s. In both cases, you can determine the rank by the number of pivots.
Think of it this way. If all fish have gills, then goldfish have gills, but not all fish are goldfish.
Likewise, all matrices in row-echelon form have pivots. Matrices in reduced row-echelon form also have pivots, but not all matrices in row-echelon form are in reduced row-echelon form. Pivots, and thus rank, are a general feature of matrices in row-echelon form, not just the special case of those in reduced row-echelon form.
Insisting that Mr. Serrano has made a mistake may be preventing you from understanding the concept.
Sean, it appears to me you are oblivious of my original question. The original question (scroll all the way to the top of this thread) was about how Serrano in a Week2 video lecture titled “Row Echelon Form”, mentions that the Rank of a matrix is the number of 1s in the DIAGONAL of the matrix in its row echelon form. Then again from Week 2, in a subsequent video titled "Row Echelon form in General”, Serrano mentions (this time, correctly) that for a matrix in its REF, Rank is the number of Pivots.
If you started counting the number of 1’s in the diagonal of a REF matrix, you may or may not get the correct Rank. This can be easily seen if you divide the first row in the matrix below in its REF by 3 and count the 1s across the diagonal - that count is 1, but that is not the Rank of this matrix, is it? The correct Rank is the number of Pivots, which is 3.
In short, Serrano states Rank as the count of 1s across the diagonal of a matrix in its REF, and then also states the Rank is the count of Pivots of a matrix in its REF. That is where the mistake in the videos is.
If you still cannot see why I said the lectures provide conflicting information, I think I am just going to have to walk away from this knowing that DL Community Support either does not want to acknowledge mistakes in lectures, or they want to force the members into thinking they are wrong.