I’d like to express my gratitude to DeepLearning.AI and the NLP specialization team, particularly Younes and Lucaz.
On January 8, 2022, I finalized all the courses (with the updated version) and I received my certificate for the entire specialization
This specialization taught me a lot…
And I’d want to share a little thing with you:
I’ve recently fine-tuned the GPT-3 model (the open source version just before the huge GPT-J).
Yes, in the specialization; we haven’t encountered GPT-3 ( Andrew mentioned it once in the batch)
But what I learned and acquired in the NLP specialization (without forgetting the specialization mother: Deep Learning Specialization (with Andrew, Kian and Youens), provided me with a foundation that allowed me to go much further, and here’s what I did (183 downloads of my GPT-3 fine-tuned model in just 9 days):
I must admit that I struggled a little (…), but it was an enjoyable experience…
My goal was to fine-tune it on a specific dataset and deploy it with a demo like this one (of Eleuther.ai): GPT-J Eleuther.ai Demo
but with the stipulation that all tools be free(…)
Steps & Links:
1- Steps
- I fine-tuned the open source version of GPT-3 (125M), using free google colab version (1 GPU) & Hugging Face
- I fine-tuned it on a dataset called “Multi-XScience”, here the original repository: Multi-XScience
- I deployed it using Google “Material Design” (on Anvil): By using Material Design, I created the front-end with a simple drag and drop interface. Then I wrote the front-end logic code in Python (100% web-based: no installation is required).
- Since Google Colab doesn’t offer persistent running notebooks, I used the Deepnote platform that doesn’t offer GPU but I used my saved model trained(fine-tuned) on Google Colab to run it and connect it to my UI demo(on Google Material Design), and Deepnote platform offers persistent running notebooks for one month (After that, it will be $0.50/hr)…
2- Links to my demo and beyond
- My Scientific-text-generator demo lke GPT-J Eleuther.ai Demo
- A video showing my Scientific-text-generator demo(my deployed fine-tuned model) (vs GPT-J Eleuther.ai Demo results): real-time Demontration
- The repository of my fine-tuned model, on Hugging Face that you can clone : my fine-tuned model repository : 183 downloads in just 9 days )
- The original Code of the open source version of GPT-3 that I used
the code is provided from the smallest version (125M) -that I used- to the largest one (2.7B): GPT-3 Open Source Version.
3- Algorithms shown in the UI of my demo (Greedy, TOP-P sampling, and Generic Sampling)
- Greedy algorithm: will continuously predict the next token by selecting the most likely one given the previous one…
- Generic Sampling algorithm: Like the greedy algorithm, it will continuously predict the next token, but all the tokens have the chance to be selected + adding a Temperature paramater, that is between 0 and 1. The Temperature indicates “how likely we are to select low-probability tokens”. When it is high (close to 1): we don’t add a penalty to low-probability tokens, and when it is low (close to 0): this will turn the entire algorithm into the greedy algorithm. For my demo, I used for the Generic Sampling a temperature = 0.7 (and the algorithm can select from all the possible tokens…): This algorithm is more creativethan the others (but somtimes gives us “gibberish”).
-
TOP-P sampling: It is like Generic sampling but more restrictive: the algorithm will select the next token from the top “P” tokens. For example , when P=3 (the model will select from the 3 top tokens with the highest probability, and all the other tokens won’t be considered… and when P=1 => the algorithm will be equivalent to the greedy algorithm and when P= max number of tokens=> it will be the generic sampling.
For the TOP-P sampling algorithm in my demo: temperature = 0.5 and P=10