Potential Review Needed for Notebook Lab 3

I think the “Jupyter” notebook used in week 3 for Lab 3 needs some revisions. Here are some issues, referenced by the markdown sections of the notebook:

Section 2.2

(1) In the text (markdown), there’s a part that seems to be a reviewer’s comment. (By the way, this comment was incorporated into the text correctly, as can be seen by comparing the condensed text present in the video with the text we see when opening the notebook). I think the review comment can be deleted.

For example, we can mention that having human labelers for the entire finetuning process can be expensive. A practical way to avoid that is to use a reward model.

use feedback generated by a model

(2) In the same section, there are also some cells with repeated code from previous cell, like:

print(sentiment_pipe(non_toxic_text, **reward_logits_kwargs))
print(sentiment_pipe(non_toxic_text, **reward_probabilities_kwargs))

(3) The ‘toxic_text’ and ‘non_toxic_text’ used as examples are different from the video. (This is related to the next).

(4) While the previous issues were relatively minor, I encountered a significant inconsistency in the EXECUTION as well. I tried the exact ‘toxic_text’ from the video (starting with “You are…”), and I got the following output (indicating that it is NOT toxic):

logits [not hate, hate]: [4.697163105010986, -4.222471237182617]
probabilities [not hate, hate]: [0.999866247177124, 0.00013371932436712086]
reward (low): [4.697163105010986]

Section 2.3

(5) In the end of section 2.3, the toxicity results of the model differ from those shown in the video (presumably) for the same model.:

toxicity [mean, std] before detox: [0.046981223255649886, 0.05390841506846864]

(6) The training results also exhibit discrepancies, with more than half the results shown in the last cell exhibiting increase in the toxicity measure (of the output of the trained model). While I acknowledge that random number generation may influence the training process, I recommend double-checking.

Any updates or actions taken on this matter?

Thank you!

@chris.favila, are you the tech lead for this course?

Hi Pablo, and thank you for the feedback! We’ll review the notebook and make changes as necessary. At first glance, I agree that the randomness might play a significant factor in the results. But we’ll take a closer look and see what other things might be causing the discrepancies. Thanks again!

2 Likes