Test Error Utility

Although I understand that the test error is used to get an idea on how the model would perform when deployed in production, I still do not have a clear idea on how this effects our modelling. Meaning, if we find that a model finalized based on dev set performance, does poorly on the test data, what are the actions we take?

  1. Do we create a new dev and test set?
  2. Do we select another model showing reasonable trade-off b/w dev and test error?

Option 2 seems the better choice.