Week 2, Assignment 2 | Why was the LSTM model giving much lower test accuracy than the plain word averaging model?

I tought the LSTM model would be a more sophisticated model and would be able to achieve better test accuracies than the plain word vector averaging model that was trained first. Can someone shine some light on this?

The two models are fundamentally different in terms of assumptions, parameters, hyperparameters, etc. Their bias-variance characteristics differ. Therefore, it’s impossible to say which model will perform better without specifying the experiment setup.

That said, it’s possible to discuss the expected behavior. Even if both models converged to their respective optima, their performances depend (at the least) on training set size and model complexity. A more complex model (here it’s LSTM) requires more training data with a lot of variety to perform well in the real world.

1 Like