Lab3 building the dataset tokenize and then decode the text, is it not redundant?

MarthaShaka · August 15, 2024, 7:12pm

In your code for preparing a dataset for fine-tuning a language model using PPO (Proximal Policy Optimization) for Reinforcement Learning from Human Feedback (RLHF), you’ve chosen to tokenize and then decode the text, which looks redundant. why not just say sample[“query”] = prompt

gent.spah · August 16, 2024, 8:47am

Maybe its redundant, try to check with just the input text, lets see if the PPO accepts it that way, and also check if the decode output is the same as the original text!

Topic		Replies	Views
Training Process lesson - Why Tokenize two times Finetuning Large Language Models	5	173	August 28, 2023
Sampling_decode Assignement function NLP with Attention Models week-module-1	1	562	July 1, 2022
Lesson 5 dataset preparation - first example adds prompt text but the final function does not Finetuning Large Language Models	0	18	January 2, 2025
Instruction finetuning dataset Generative AI with Large Language Models week-module-2	1	417	July 22, 2023
Week 3: Video RLHF Reward Model Generative AI with Large Language Models week-module-3	0	318	November 18, 2023

Lab3 building the dataset tokenize and then decode the text, is it not redundant?

Related topics