Love the course, learned a lot. And now I am trying to put theory into practice, and I find it quite difficult….
I came up with this use case:
Let’s say I have the transformer paper (Attention is all you need, 2017) and I want to use an LLM to “chat” with this paper so It can answer all my questions about transformers. Using the Generative AI lifecycle and topics discussed in this course. How could this be achieved? What would be an approach?
Some thoughts: A scientific pdf has a certain formatting and is probably too big to fit in a single prompt (some kind of cleaning required and then embeddings?). Also, I assume certain questions can occur very often when “consulting” a scientific paper: E.g. “In what year was this paper published?”, “Who are the authors?”, “Summarise the paper in layman’s terms” this suggests that maybe in-context learning (ICL) is possible? But then again, PDF’s are too large for a single prompt.
What type of pre-trained model could be fine-tuned for this? Or should we view this problem as an external application problem and use something like RAG?
I’m confused . Hopefully someone can give some pointers or share their thoughts. It would be much appreciated.
You saw already the way of using embeddings to solve part of your problem.
Then you have some other questions:
PDF too large: Yes, you have to split it in chunks if you want to give it to an LLM to extract data or summarize it.
For this task, I think that the big models are the best. Think GPT, Claude. Smaller models, in my experience, are not too good at extracting or creating good summaries.
There are many ways to augment LLMs by uploading documents to an application that then uses an LLM to analyze it. You can even do this out of the box with Claude (https://claude.ai/) if you don’t want to build anything. It allows you to upload up to 5MB (iirc) documents.
Try using the new Claude 2, it has over 100K token context window (actual capacity is 200K but they haven’t rolled that out for free to web randos like us yet As for formatting, David Shapiro has his LLM summarize without worrying about charts, etc., and says the results are pretty accurate … he has a github resource for automating the process that you might find something interesting in. video here Accelerating Science with AI: Quickly Read Every Paper and Get Key Insights in Bulk - YouTube
Thank you for all the repsonses. I will check out Claude, embeddings and splitting up in chunks + langchain as well.
@Juan_Olano am I correct in saying that for high quality summaries: bigger models! But for basic questions such as what year was the paper written, who is the author, smaller models ma suffice? Maybe even FLAN-T5? Or do you expect those open source models require fine tuning for a task like this?
Finally, any thoughts on how to deal with tables and headers/template of the document? These can potentially be very helpful.
Feel free to ignore me if I ask too many questions… The course has left me feel both inspired and confused
For basic questions, many times the best solution is via embeddings. As for smaller models for basic questions, I still think they may not perform very well. If your subject matter is too broad, I would try to solve it with big LLMs. Claude is a great platform to solve the basic questions at a very low cost.
For the ‘key’ output, which is the summary, I would also try Claude and compare with GPT.
BTW a paper was released yesterday discussing the diminishing quality of GPT - that’s something to look at.
How to deal with tables, headers, templates: Again, Large Language Models seem to be best at this. May be for this I would try GPT with its new Code Interpreter.
Small models can be fine-tuned and produce very good results in very very specific, almost narrow tasks. Not sure if your content qualifies for this.