I think the “Jupyter” notebook used in week 3 for Lab 3 needs some revisions. Here are some issues, referenced by the markdown sections of the notebook:
Section 2.2
(1) In the text (markdown), there’s a part that seems to be a reviewer’s comment. (By the way, this comment was incorporated into the text correctly, as can be seen by comparing the condensed text present in the video with the text we see when opening the notebook). I think the review comment can be deleted.
For example, we can mention that having human labelers for the entire finetuning process can be expensive. A practical way to avoid that is to use a reward model.
use feedback generated by a model
(2) In the same section, there are also some cells with repeated code from previous cell, like:
print(sentiment_pipe(non_toxic_text, **reward_logits_kwargs))
print(sentiment_pipe(non_toxic_text, **reward_probabilities_kwargs))
(3) The ‘toxic_text’ and ‘non_toxic_text’ used as examples are different from the video. (This is related to the next).
(4) While the previous issues were relatively minor, I encountered a significant inconsistency in the EXECUTION as well. I tried the exact ‘toxic_text’ from the video (starting with “You are…”), and I got the following output (indicating that it is NOT toxic):
logits [not hate, hate]: [4.697163105010986, -4.222471237182617]
probabilities [not hate, hate]: [0.999866247177124, 0.00013371932436712086]
reward (low): [4.697163105010986]
Section 2.3
(5) In the end of section 2.3, the toxicity results of the model differ from those shown in the video (presumably) for the same model.:
toxicity [mean, std] before detox: [0.046981223255649886, 0.05390841506846864]
(6) The training results also exhibit discrepancies, with more than half the results shown in the last cell exhibiting increase in the toxicity measure (of the output of the trained model). While I acknowledge that random number generation may influence the training process, I recommend double-checking.