I’ve successfully completed the ‘Improvise a Jazz Solo with an LSTM Network v4’ programming exercise. However, I’ve noticed a significant shift in the model training approach this week. Unlike previous lessons/courses, this exercise doesn’t incorporate any evaluation of test or development data loss during training, as demonstrated in the code below:
history = model.fit([X, a0, c0], list(Y), epochs=100, verbose = 0)
print(f"loss at epoch 1: {history.history['loss'][0]}")
print(f"loss at epoch 100: {history.history['loss'][99]}")
plt.plot(history.history['loss'])
The focus seems solely on the training loss, without any consideration for test or development loss. Is this approach specific to generative models, where perhaps the emphasis on development loss is less critical, or is it challenging to evaluate the outputs in a comparative manner?
How you evaluate your results from a trained model depends on what the goal of your system is. If you are implementing something like a classifier where there is a definite correct answer, then you do the full evaluation with train, dev and test data. But here in Course 5 we are starting to see applications which have more of a “generative” flavor to them, like the music app in assignment 3 and the dinosaur names in assignment 2. If you’re trying to generate a new sequence of musical notes, the goal is different than “correctness”. It’s something more like a combination of “interesting” (meaning not just a straight copy of the training data) and “pleasing”, which I think accounts for the different style of evaluation here.
Mind you, I have not worked with any of the more sophisticated generative applications based on LLMs, so I’m not aware of how the training and evaluation work there.
I appreciate your explanation, but I’m still grappling with a concept that seems counterintuitive to me. Throughout my learning journey, the primary focus has been on avoiding overfitting; ensuring that the model doesn’t just memorize the training data so thoroughly that it fails to generalize to new, unseen data.
However, it appears that when it comes to training generative models, this principle doesn’t apply in the same way. It seems almost as if allowing the model to ‘memorize’ the training data is not just acceptable but preferable.
This shift in perspective is a bit challenging for me to grasp. I’ve always been taught to train models in a way that avoids overfitting, aiming for a model that generalizes well, rather than one that narrows down its learning to the training set alone. Is continuing training until the training loss approaches zero actually a viable strategy in this context? (I don’t think so, as we don’t want the very similar as training data, so in what extent should we stop training, manually review upon a number of epochs and call it a day if the model seems working ok (by generating good text/music/others)
This feels like a departure from conventional wisdom regarding overfitting, and I’m trying to understand if I need to adjust my understanding of these concepts when it comes to generative modeling.
I think the key point is what I alluded to in my previous response:
In other words, what your goals are and how you achieve them depends on the circumstances. I should make the important disclaimer that I do not have any direct experience with implementing LLMs or generative models other than the examples in DLS Course 5 and the GANs specialization from DLAI. But even within the topic of “generative models”, they come in lots of different flavors with different goals. If you are building an LLM and then using it to implement something like a web browser extension, then the goal is to accurately reproduce the results and to avoid “hallucinations” or confabulations. But in other cases if the generative goal is to produce an artistic image or a piece of music, then you specifically don’t want the model to exactly duplicate the training data. The whole point is that you want the output to be “creative”, meaning new and interesting. E.g. we saw an example in DLS C4 where the system is trained to apply the style of an art work to a photograph or other image.
So those two cases (accurate browser extensions or creative image generaters) are completely different in terms of what you want. So you have to figure out what the training criteria need to be to achieve your goal. There will not be a universal correct answer, as in almost any topic to do with ML. It’s complicated and it requires that you understand what your goals are and what you need to do in order to achieve them.