Why is "m" the variable for the size of the training set?

Hello @scoofy, this is my guess.

When I learned Matrix in high school, we always said the size of a matrix is m x n, which is m rows and n columns.

In ML, we always represent a tabulated dataset as a matrix, having one row for one data sample, and one column for one feature.

So if there are m rows, there are m samples, and when there are n columns, there are n features.

You can also see m and n being used in wiki to talk about the size of matrix!

Hope this help!