I took the Deep Learning Specialization in the Fall of 2017. At the time, my interest was very much in NLP and I was interested to learn about the SOTA sequence model approaches. Turns out, the Sequence Model course wasn’t actually released yet. Instead I ended up spending way too much time on a deep dive into CNNs, particularly in the context of YOLO. I dragged my feet digging back into NLP, especially after some of the newest innovations were being released in PyTorch. I’m 67 and can’t even list all of the programming languages and platforms I have learned over the years; didn’t want to learn another.
That changed this week. I got a new, entry level MacBook Air, and tinkered around getting Python, Tensorflow and Keras installed. Tried to run some of my old NLP materials but couldn’t get an environment stood up that could access TF datasets. Broke down and tried PyTorch. Now glad I did.
First, I found a really nice notebook on GPT on Github.
Maybe you know already, but none of the OpenAI GPT code is open source. And the latest models, to the best of my knowledge, don’t even have a published architecture. GPT-2 doesn’t have open source code, but the architecture is known. Using the pattern from the notebook, I built my own GPT class
which implements the GPT-2 architecture
Once the layers are defined and an instance created, I downloaded a pre-trained GPT-2 model from HuggingFace, and copied over the weights
I don’t have access to OpenAI’s GPT class code, but I do have access to the locally defined class code. So I went in and made a few hacks, basically instrumenting the inference loops to force some output after the model.generate() is invoked. Here’s the call to generate()
The text prompt was “What is Machine Learning?”, You can see the plain text and encoded versions of it. Below is the output from the iterations as it processes the prompt and generates the response. I added a block to print out the top 5 token candidates and their probabilities. Only the top candidate is included in the response. NOTE: for people who are asking whether or not OpenAI GPT is sentient, here is the answer. NO.
Finally, the last iteration and the generated response
So now I have a pretty decent understanding of what GPT is doing under the covers, because I can see the iterations and where the candidate ranking and selections is occurring. Since I don’t have the GPT-2 corpus, I can’t train what I have, and honestly I couldn’t do it on my toy computer anyway. But I might be able to do a little fine-tuning, which is the next thing on my list to try.
Feel free to ask questions about what I did (I added a few things and also had to edit some of the GitHub code to get it to run in my environment) and check out the GitHub repository shown in the top of the notebook…lots of interesting topics there.
Cheers






