Process Feedback with Online Learning NN

andrew0 · December 6, 2021, 8:29pm

Say I have a translation model which takes in an input i and outputs o, the translated sentence. The user can then respond to the translated sentence with a good/bad pertaining to the quality of the translation. I can see how process feedback would work with a good label, by adding a training example (i, o). But if the user answers bad, is it still possible for the model to learn from this? Or would process feedback normally also need the user to input the correct sentence?

tl;dr can a model learn from an input and a purposely incorrect label?

kiavash_fathi · April 24, 2022, 4:20pm

Dear Andrew,

Thank you very much for your question. I guess a very good example of what you are looking for is a reinforcement learning (RL) model. As you may know, in RL the agent (in our case the translator) is interacting with the environment (in our case the user) and is receiving reward or punishment (being good/bad labels). By using the gathered data from the user, our RL agent will try to learn a policy which maximises its rewards. So in short, yes it is possible to learn from purposely incorrect labeled datapoints. You can use this link to read more about RL.

All the best,
Kiavash

Topic		Replies	Views
Process Feedback with Negative Examples AI Discussions	2	50	December 9, 2021
How to feed model output data back for training? Machine Learning in Production	1	822	February 28, 2023
Clarification on lecture video RLHF: Obtaining feedback from humans Generative AI with Large Language Models week-module-3	6	452	July 8, 2023
Does reward model need retraining with domain specific inputs? Generative AI with Large Language Models week-module-3	2	306	November 5, 2023
Extra info about week2 - Cleaning up incorrectly label data Structuring Machine Learning Projects coursera-platform	5	541	December 29, 2022

Process Feedback with Online Learning NN

Related topics