Week1 programming exercise question, how can we know if the RNN/LSTM used for sequence generation is overfitting the train data?

anon76241992 · October 9, 2023, 2:56pm

HI, I have just finished the programming exercise of week1, the dinosaur name and jazz solo generation using RNN/LSTM. It was really fun actually.

But I just wondering, if this kind model training procedure can lead to overfitting the training data??

I think the model could overfit the data and if that the case, it should predict something like the almost exact dinosaur name in the corpus, right??

But how can we spot that before the model reach that state?? Because the training method shown in the notebook is quite similar, if I am not mistaken, to that of auto regressive model, I just can not imagine the way to detect overfitting

thanks

balaji.ambresh · October 9, 2023, 3:42pm

Your observation that model assignments train the model in a similar manner is correct. During inference, when we use np.random.choice to pick the output at a timestep, it need not correspond to np.argmax. As a result, there’s some level of randomness (although small) that could get the model a bit creative.

Courses 2 and 3 do cover model analysis in detail. Could you please explain how that content doesn’t cover the question below?

anon76241992 · October 10, 2023, 5:26am

Yeah! Thanks for replying balaji.

Well, I have indeed remembered the use of

np.random.choice

but I don’t think there is any explanation regarding the overfitting in any material in week 1 though. And that’s why I crated this question post.

So then the model could overfit the train data, and maybe this is what we want?? But we just make the generated text more creative using the sampling method, so it’s doesn’t matter, do I understand this correctly??

balaji.ambresh · October 10, 2023, 6:59am

There are 2 points to note:

If we were to randomly select characters without training, there’s no alignment with the patters in data. This is why we use random.choice after training.
Model shouldn’t overfit the training set since we want to generalization capability towards unseen inputs in the training dataset to exist. Once you complete this assignment, please divide the dataset into train / dev / test splits and perform analysis offline.

anon76241992 · October 10, 2023, 9:16am

Yeah, thanks that clear a lot

One last question. Beside the offline analysis that you just mentioned, is there any more systematic procedure or best practice to evaluate the generalization ability? Like when we use FID for GANs and other image model.

balaji.ambresh · October 10, 2023, 10:34am

You’re welcome.

What’s FID?

anon76241992 · October 10, 2023, 11:37am

Fréchet inception distance

balaji.ambresh · October 10, 2023, 5:13pm

Sorry but I’m unaware of a metric for text / audio that’ll determine how creative a model output is.

anon76241992 · October 11, 2023, 1:46pm

Yeah, that’s fine. Thanks for your support

Topic		Replies	Views
Sequence Models Week 1 - Questions on Assignments 2 and 3 Sequence Models week-1	1	144	May 1, 2024
Week 1 Coding Assignment 2 Sequence Models	4	428	August 12, 2023
Course 5 Week 1 Sequence Models Sequence Models week-1	3	327	February 14, 2024
Week 1 Sampling Novel Sequences Sequence Models	4	595	June 20, 2024
C5 W1 - Jazz Improv HW - What is the relationship between Exercise 2 and 3? Sequence Models	10	567	September 20, 2021

Week1 programming exercise question, how can we know if the RNN/LSTM used for sequence generation is overfitting the train data?

Related topics