C3 W2 Model Deployment Strategies. How the MAB works

Greetings! If you take a look at the video “Model Deployment Strategies”, you might notice that multi armed bandit rewards the model which predicts more accurately. However, in order to do that, we need to get the ground truth along with the input. How that can be done?
I mean, if our model predicts [-1, 0 1] labels (based on review text), and we don’t see the star rating, how then do we estimate the model performance?
In most of the cases of supervised learning we can’t get the GT during prediction (because otherwise it doesn’t make any sense to build the model for prediction :slight_smile: )

Hi @PDS_Mentors,

Can one of you answer here ?

Thanks,
Mubsi