K-fold cross validation

I use k-fold cross validation for my data. My problem is when I use it for one model I get bad result for every fold But when I determine for example for train It gets 80% first of the data It gets better accuracy. Is this normal problem? or I implement k-fold wrongly?

K-fold cross validation is not a topic that Prof Ng covers in this specialization (at least that I can remember). I googled it and found this explanation on Jason Brownlee’s website.

Having taken a quick look at Jason’s explanation, I think I have the basic idea of what K-fold cross validation means. But with that (perhaps sketchy) understanding, I am having trouble making any sense of your question. This is purely a guess on my part, but are you sure you understood the point that you start the training completely from scratch on each “fold”, right?

But I think the higher level point is that you need to give us more to go on here: please give a more detailed explanation of what you tried and what results you saw. Please include information about the nature of your dataset. Is it one of the ones from the assignments here? Or from somewhere else? How many total samples does it contain?

Because of the name of k-fold, we named each of the training fold. I implement one model, It’s 3 layer. The first and second one is relu and the last one is softmax. When I split my data with train_test_split in sklearn library I give test 0.2 and It returns 99% train accuracy and 90% test accuracy. But when I use k-fold for this data, each of the folds return about 75% accuracy for train and 65% test accuracy.

1 Like

That doesn’t sound like what I would expect. But what value of k did you use? That kind of matters, right?

To get the equivalent of 0.2 for train/test split, you would use k = 5, right?

1 Like

I use 5. No both are 20 80 but the result is different.

Ok, is anything else different about the various hyperparameters? E.g. number of iterations, learning rate, network architecture?

Are you using some “black box” scikit routine to do the “k-fold” logic? Or the non-k-fold?

The point is that the fundamental training operation is the same, right? It should only be a question of how you actually perform the data split.

But if you are just doing all this stuff by calling some black box library routine, then you have no control over any of it.

Yes, All the part of the architecture is same but I don’t understand why this happen. Most my problem is that the split is 20 and 80 and the last fold of the data is split as same as this but it doesn’t have same result or close result.

Sorry, what do you mean “the architecture is the same”? Did you actually write and use the same code to execute the training in both cases? You didn’t really answer my question about whether you are using some sklearn API (which is a “black box”) to execute any part of this.

If the behavior is different, then there has to be a way to explain why that happens.

When you are using the k-fold cross validation for tuning the hyperparameters, you are training your network on subsets of your training data instead of all your training data. Thus you end up with lower accuracy in training and testing. Using different values for k also lead to different results.

1 Like