@rmwkwok -
This is a great response, I hope others stumble across this answer like I did.
I was going to write a separate question, but I will try to fit it in here. I am hoping for some clarifications regarding the K-fold cross validation process and what happens afterwards. Basically, how do we proceed after step 4? Below is the questions set up/background so you can see exactly what I mean, followed by the actual questions.
Questions set up:
Background of hypothetical problem: Let’s assume we have a NN model and we want to tune a lambda only, we choose 3 values to investigate, and we do a 5 fold CV as you explain in your steps 1 through 4. So that is 15 models/weights that we will end up with. Let’s also assume that for each split we normalize the data: (as you have verified in other questions for me and in the course) → by finding the mean and Standard deviation (Std) from the training data and use it on the training data to normalize and then using the same mean and Std to normalize the CV data. After all of this, each of the three lambda’s will have 5 error scores for each fold, which we can average for each lambda, this is your step 4, which all makes sense.
Let’s say a lambda of 0.02 was chosen as the lambda associated with the lowest CV error scores. So, now let’s assume that we have a separate set of data that was the true test data and was never seen by the training or the in the CV hyper parameter turning. The general idea is that we will use the lambda and model that gave us the lowest CV error score, which sounds great and easy, but I’m having trouble with the details.
Question 1: Which data set do we use to normalize the test data?
For each of the 15 models we ran, we had different normalized training/cv data that relied on different mean and standard deviations to normalize (since each fold has a different training data set). So, for the separate test data, which mean and standard deviation do we use to normalize that data? I’ve read and confirmed that we are supposed to use the training mean and standard deviation, but there are 5 of them for the final chosen lambda of 0.02, should I find the average mean and standard deviation of the 5 training sets, or something else?
Question 2: Which “model” (learned weights) do we use on the test data?
After we normalize the test data set (somehow, see Q1 for answer), similar to Question 1, which model do we use on the test data. I know we will use the model that uses the chosen lambda value (of 0.02), but there will be 5 models from the train/CV hyper tuning process. When I say “model” I mean a model with different weights that were learned on each of the 5 training data sets for a lambda of 0.02. A guess: Maybe we use the lambda (plus other parameters) to learn the weights on the test data itself and then use those weights along with the correct hyper parameters to be used in the .predict(test_data_here) on the test data? Or is there some other way? Also, what happens if your separate test set is just one row, ie, one case with the appropriate inputs, using that to learn weights wouldn’t make any sense.
Another way to ask this question: Can we use a previously found model (from the train/cv data) and it’s learned weights (and just skip to the .predict part), or do we need to use the same parameters (lambda etc) and find new weights on the test data (hoping that our test data is more than one data point)?
Question 3: What does the code look like and is there anything beyond using .predict and finding the error on the test data?
Let’s say we have chosen a way to normalize the test data, and a way to choose which learned weights/model to use on the test data. Then how do we actually move from there? I’m assuming it looks something like this pseudo-code?
→ Load separate/unseen test data
→ Normalize data somehow (Question 1)
→ model.fit(X_test?,y_test?) # May not be necessary if you choose a model from the train/cv step (Q2 related)
→ model[model_number_here].predict(test_data_here) (Q2 related)
→ calculate error on test data
Hopefully these questions make sense. I can move this to a separate topic too if needed. Thanks!