The intuition is clear that in order to address high variance or bias, we need a subset of the entire training examples and train our model on that data and then use the remaining subset, which is unseen by the model to check if the model has high variance or bias.
Training data is the subset of the training examples on which we train out model.
Dev set and Test set are the subsets that are unseen by our model.
It’s unclear that why do we need more than one subsets of unseen data to choose a model. We could just use one subset of unseen data, as only one subset influences our decision of choosing the right model.
In the lab, I switched the data sets of Dev Set and Test Set, and after running MSES I found out that by switching the data sets, the degree of 6 for the function was estimated to be a better model with even less General Mean Squared Error. (Second image represents output after data sets were switched)
This raises a question that, how to decide which data set should be treated as the unseen data, and further, how many subsets of unseen data should be created. How to know the fine line between Training set, Dev set, and Test set.
The performance of a model is only an estimation, you can actually see how good or bad your model is once is in production, to ensure it works as supposed to be working we need to replicate a real-world environment when unseen data is coming. Think about it like a test, if your goal is to have a good grade and create some demo test to practice that’s only an estimation, your real grade will be released after you take the real exam, in this case, your lectures are the training data, some previous exam are validation data, that you update after you study and you save some test to see if you learned the material one day before the exam, the production environment is the actual exam.
Exactly, the test set is just you saving some data to mimic (trying) the production environment. It is a great question and you should definitely spend time learning as this would be one of the most crucial things to learn in machine learning since the accuracy of your models relies on these concepts.
I also had this discussion when learning this course. Basically the cross validation data (not testing) is used to indirectly influence the learning of weights in the model.
How I think is (using student and exam analogy),
Training Data: Example from book / question bank
Cross validation: Student checking his/her knowledge based on what they just learnt from the example, if they choose same examples they might be biased to the factors (keyword in question, ordering of question and etc), but actually not learning from these examples. They are just for personal check if they gained any information to solve the questions or not
Testing Data: Mock test paper conducted by the coaching institutes
That’s a good analogy. And you’ve put it well by saying that it has an “indirect” influence on the learning of weights.
If you see my question I switched the test set and the dev set data and realized that after switching the data, I got different degrees that fit best to the model. So it certainly had an influence on the decision of choosing the right model.
The mock test can still be used as the validation. It is more about the sequence of LEARNING, THEN demonstrating that you have properly learned. Like the student, the mock test will give the student insight into what they must change and work on (adjust your understanding of Newton’s 2nd Law, adjust your approach to solving Taylor Series, etc.) to get a better grade when the final test comes around. The adjustments are essentially the optimization of your brain
Production can be how you synthesize information to make decisions after getting your degree and working professionally haha. The data is constantly changing and you have to generalize based on some fundamental training and optimization that you’ve had over the years, i.e, years of experience in a particular field. In my case, aerospace. I have bachelors and masters in mechanical and aerospace engineering but when I am designing jet engines I use that foundation to synthesize with years of experience (years of brain optimization) and make decisions. And it is all at inference time. Notice how you adjust your approach as you get older in life, your experiences teach you new ways of problem solving, so you adjust your brain for next time.