In “dataset for RL training” its stated that we need to have the dataset from same distribution at the end of the video.
My question is why? if its a summarization task, can’t we have it sourced from different sources with different distribution or topics? or i got the phrase wrong?