Transfer Learning or Train


I was thinking when should I train a model vs using Transfer Learning. Given the fact that GPT-3 cost USD5M in training, and adding the fact that I’m nowhere near to invest even 1% of that budget into training a model, is it a good idea to even try? Or should we use Transfer Learning all the time given how LLM is evolving, leaving all Average Joes behind of the training era, and making competition against them not even worth it?
Let’s give scale it up even a little further. If I want to focus on a specific subject, for example, giving responses like Shakespeare would. Will my trained model of all of his books perform better than a T5 or GPT-4 in that area? Is it worth the time? Or should I use the books to fine-tune these LLMs and have better results instead.

Does anybody have an opinion on the topic? I’m just finishing these courses, so I still need to take my learnings into practice, but this question has been bugging my mind. Any idea, thought, experience, I’m more than happy to hear!

Thanks for your time :wink:

Hi @Felipe_Rodriguez

Given the way I interpreted your sentiment about the topic, you should not be jumping into training huge models from scratch :slight_smile:

Most probably not.

It depends how you value your time. Compared to what? Computer games? Friends and family? Etc.

For decent results you should go for that - fine-tuning open source LLMs or you can use existing ones (like ChatGPT API does not cost a fortune), depends what are you going to do with “Shakespeare responses” :slight_smile: (in other words, whats your business plan?)

All in all I think there is enough space for everyone:

  • fine-tuning open source LLMs;
  • using existing LLMs through APIs;
  • searching for a new activation function (like ReLU);
  • you name it :slight_smile:

It seems daunting at first but if you like what you do you can succeed in any field. It’s true that the most “effective” path is not clear but I think you should concentrate not on the efficiency but on the things that interest you.

Just my thougts :slight_smile:

Thanks for your reply @arvyzukai

I considered implementing NLP into my everyday tasks and new business to develop. That’s why it intrigued me about what’s the best way to move forward.

What I got from your reply is that it’s better to leverage an LLM and fine-tune it rather than start it from scratch, which makes sense to me too. I’d like to know if there’s a line on specific subjects that are better to train yourself, and that’s why I brought the Shakespeare example (Everybody can get Shakespeare’s literature, and hence, train it to reply like him).

It does bother me, nevertheless, that I could get unexpected answers if I use the LLM instead of training it from scratch. For example, if I want it to answer like Shakespeare, with an LLM, it could be possible for it to quote Proust instead of Shakespeare, rather than if I train my model just with Shakespeare’s books and quotes, then there’s no room for the model to answer like any other author.

That’s where my confusion lies

There is some truth to that but if you fine-tune it good enough then there should be no problem. Question is how big are disadvantages of these edge cases and are advantages of LLMs big enough. If you loose your leg when the model quotes Proust, then I think it’s better to stay safe :slight_smile: But if the occasional deviations from the Shakespeare are barely noticeable and conversational capabilities are more important then fine-tuning should not be a problem.

Again, this is just my opinion and not an advice for career or anything similar :slight_smile: In other words, consult with your doctor / lawyer etc. :wink: Or on a serious note, listen and ponder other opinions too.