C3_W4_Lab_2_TFX_Evaluator Is there any systematic way to decide if a model is blessed?

Per the notebook C3_W4_Lab_2_TFX_Evaluator the eval config and resolver is setup and any new trained model with binary accuracy not worse than 0.5 and prev - 0.1 would be considered as a blessed model. Just want to know in the real industrial environment how did this metrics come up? Is there any systematic way to decide if a new model is better than another and should be chosen to deployed?

Acceptance criteria for model revision is part of the project definition. It comes from the client since they are going to pay you for improving the model.

This can be to the case for value threshold, but how did you define model B is better than model A given both passed the value threshold? Is that confidence enough that model B with accuracy of 0.85 is better than model A with accuracy of 0.837 per the evaluation? Shouldn’t it be a more rigid statistical test before deployment given there would be overhead incur for model replacement?

Performance on test set is a necessary metric for judging model performance. Some clients might say that even a tiny boost in performance on test set is good enough for updating their model. You should talk to talk the client if either of you need more statistical tests.