Say I have a translation model which takes in an input i and outputs o, the translated sentence. The user can then respond to the translated sentence with a good/bad pertaining to the quality of the translation. I can see how process feedback would work with a good label, by adding a training example (i, o). But if the user answers bad, is it still possible for the model to learn from this? Or would process feedback normally also need the user to input the correct sentence?
tl;dr can a model learn from an input and a purposely incorrect label?
Dear Andrew,
Thank you very much for your question. I guess a very good example of what you are looking for is a reinforcement learning (RL) model. As you may know, in RL the agent (in our case the translator) is interacting with the environment (in our case the user) and is receiving reward or punishment (being good/bad labels). By using the gathered data from the user, our RL agent will try to learn a policy which maximises its rewards. So in short, yes it is possible to learn from purposely incorrect labeled datapoints. You can use this link to read more about RL.
All the best,
Kiavash