A tribute to DeepLearning.AI and the NLP specialization team & My fine-tuned GPT-3 Demo

I’d like to express my gratitude to DeepLearning.AI and the NLP specialization team, particularly Younes and Lucaz.
On January 8, 2022, I finalized all the courses (with the updated version) and I received my certificate for the entire specialization
This specialization taught me a lot…

And I’d want to share a little thing with you:
I’ve recently fine-tuned the GPT-3 model (the open source version just before the huge GPT-J).

Yes, in the specialization; we haven’t encountered GPT-3 ( Andrew mentioned it once in the batch)
But what I learned and acquired in the NLP specialization (without forgetting the specialization mother: Deep Learning Specialization (with Andrew, Kian and Youens), provided me with a foundation that allowed me to go much further, and here’s what I did (183 downloads of my GPT-3 fine-tuned model in just 9 days):

I must admit that I struggled a little (…), but it was an enjoyable experience…
My goal was to fine-tune it on a specific dataset and deploy it with a demo like this one (of Eleuther.ai): GPT-J Eleuther.ai Demo
but with the stipulation that all tools be free(…)

Steps & Links:

1- Steps

  • I fine-tuned the open source version of GPT-3 (125M), using free google colab version (1 GPU) & Hugging Face
  • I fine-tuned it on a dataset called “Multi-XScience”, here the original repository: Multi-XScience
  • I deployed it using Google “Material Design” (on Anvil): By using Material Design, I created the front-end with a simple drag and drop interface. Then I wrote the front-end logic code in Python (100% web-based: no installation is required).
  • Since Google Colab doesn’t offer persistent running notebooks, I used the Deepnote platform that doesn’t offer GPU but I used my saved model trained(fine-tuned) on Google Colab to run it and connect it to my UI demo(on Google Material Design), and Deepnote platform offers persistent running notebooks for one month (After that, it will be $0.50/hr)…

2- Links to my demo and beyond

3- Algorithms shown in the UI of my demo (Greedy, TOP-P sampling, and Generic Sampling)

  • Greedy algorithm: will continuously predict the next token by selecting the most likely one given the previous one…
  • Generic Sampling algorithm: Like the greedy algorithm, it will continuously predict the next token, but all the tokens have the chance to be selected + adding a Temperature paramater, that is between 0 and 1. The Temperature indicates “how likely we are to select low-probability tokens”. When it is high (close to 1): we don’t add a penalty to low-probability tokens, and when it is low (close to 0): this will turn the entire algorithm into the greedy algorithm. For my demo, I used for the Generic Sampling a temperature = 0.7 (and the algorithm can select from all the possible tokens…): This algorithm is more creativethan the others (but somtimes gives us “gibberish”).
  • TOP-P sampling: It is like Generic sampling but more restrictive: the algorithm will select the next token from the top “P” tokens. For example , when P=3 (the model will select from the 3 top tokens with the highest probability, and all the other tokens won’t be considered… and when P=1 => the algorithm will be equivalent to the greedy algorithm and when P= max number of tokens=> it will be the generic sampling.
    For the TOP-P sampling algorithm in my demo: temperature = 0.5 and P=10

This is cool… and thanks for sharing every detail of execution learned new stuff along the way :slight_smile:

Good stuff! Way to go :+1:

What format is suitable for the training data have when you fine-tuned the model? Is it necessary to form Question and Answering type of data sets, or is the fine-tuning done at an earlier stage?