HI, I have just finished the programming exercise of week1, the dinosaur name and jazz solo generation using RNN/LSTM. It was really fun actually.
But I just wondering, if this kind model training procedure can lead to overfitting the training data??
I think the model could overfit the data and if that the case, it should predict something like the almost exact dinosaur name in the corpus, right??
But how can we spot that before the model reach that state?? Because the training method shown in the notebook is quite similar, if I am not mistaken, to that of auto regressive model, I just can not imagine the way to detect overfitting
thanks
Your observation that model assignments train the model in a similar manner is correct. During inference, when we use np.random.choice
to pick the output at a timestep, it need not correspond to np.argmax
. As a result, there’s some level of randomness (although small) that could get the model a bit creative.
Courses 2 and 3 do cover model analysis in detail. Could you please explain how that content doesn’t cover the question below?
Yeah! Thanks for replying balaji.
Well, I have indeed remembered the use of
np.random.choice
but I don’t think there is any explanation regarding the overfitting in any material in week 1 though. And that’s why I crated this question post.
So then the model could overfit the train data, and maybe this is what we want?? But we just make the generated text more creative using the sampling method, so it’s doesn’t matter, do I understand this correctly??
There are 2 points to note:
- If we were to randomly select characters without training, there’s no alignment with the patters in data. This is why we use
random.choice
after training.
- Model shouldn’t overfit the training set since we want to generalization capability towards unseen inputs in the training dataset to exist. Once you complete this assignment, please divide the dataset into train / dev / test splits and perform analysis offline.
1 Like
Yeah, thanks that clear a lot
One last question. Beside the offline analysis that you just mentioned, is there any more systematic procedure or best practice to evaluate the generalization ability? Like when we use FID for GANs and other image model.
Fréchet inception distance
Sorry but I’m unaware of a metric for text / audio that’ll determine how creative a model output is.
Yeah, that’s fine. Thanks for your support