Theory into practice: Generative AI lifecycle

Anton_Kuijer · July 18, 2023, 1:03pm

Love the course, learned a lot. And now I am trying to put theory into practice, and I find it quite difficult….

I came up with this use case:
Let’s say I have the transformer paper (Attention is all you need, 2017) and I want to use an LLM to “chat” with this paper so It can answer all my questions about transformers. Using the Generative AI lifecycle and topics discussed in this course. How could this be achieved? What would be an approach?
Some thoughts: A scientific pdf has a certain formatting and is probably too big to fit in a single prompt (some kind of cleaning required and then embeddings?). Also, I assume certain questions can occur very often when “consulting” a scientific paper: E.g. “In what year was this paper published?”, “Who are the authors?”, “Summarise the paper in layman’s terms” this suggests that maybe in-context learning (ICL) is possible? But then again, PDF’s are too large for a single prompt.
What type of pre-trained model could be fine-tuned for this? Or should we view this problem as an external application problem and use something like RAG?
I’m confused . Hopefully someone can give some pointers or share their thoughts. It would be much appreciated.

Anton_Kuijer · July 18, 2023, 1:04pm

While writing this I just noticed someone post something similar to this which answers part of this question:

Juan_Olano · July 18, 2023, 1:53pm

You saw already the way of using embeddings to solve part of your problem.

Then you have some other questions:

PDF too large: Yes, you have to split it in chunks if you want to give it to an LLM to extract data or summarize it.

For this task, I think that the big models are the best. Think GPT, Claude. Smaller models, in my experience, are not too good at extracting or creating good summaries.

Let more questions come up!

gabemv · July 18, 2023, 4:48pm

There are many ways to augment LLMs by uploading documents to an application that then uses an LLM to analyze it. You can even do this out of the box with Claude (https://claude.ai/) if you don’t want to build anything. It allows you to upload up to 5MB (iirc) documents.

Joy_F · July 18, 2023, 4:54pm

Try using the new Claude 2, it has over 100K token context window (actual capacity is 200K but they haven’t rolled that out for free to web randos like us yet As for formatting, David Shapiro has his LLM summarize without worrying about charts, etc., and says the results are pretty accurate … he has a github resource for automating the process that you might find something interesting in. video here Accelerating Science with AI: Quickly Read Every Paper and Get Key Insights in Bulk - YouTube

Aghyad · July 18, 2023, 4:57pm

Thank you a lot
that indeed answer a part of my question

Anton_Kuijer · July 20, 2023, 4:11pm

Thank you for all the repsonses. I will check out Claude, embeddings and splitting up in chunks + langchain as well.

@Juan_Olano am I correct in saying that for high quality summaries: bigger models! But for basic questions such as what year was the paper written, who is the author, smaller models ma suffice? Maybe even FLAN-T5? Or do you expect those open source models require fine tuning for a task like this?

Finally, any thoughts on how to deal with tables and headers/template of the document? These can potentially be very helpful.

Feel free to ignore me if I ask too many questions… The course has left me feel both inspired and confused

Juan_Olano · July 20, 2023, 6:08pm

Hi @Anton_Kuijer ,

For basic questions, many times the best solution is via embeddings. As for smaller models for basic questions, I still think they may not perform very well. If your subject matter is too broad, I would try to solve it with big LLMs. Claude is a great platform to solve the basic questions at a very low cost.

For the ‘key’ output, which is the summary, I would also try Claude and compare with GPT.

BTW a paper was released yesterday discussing the diminishing quality of GPT - that’s something to look at.

How to deal with tables, headers, templates: Again, Large Language Models seem to be best at this. May be for this I would try GPT with its new Code Interpreter.

Small models can be fine-tuned and produce very good results in very very specific, almost narrow tasks. Not sure if your content qualifies for this.

Anton_Kuijer · July 21, 2023, 7:00am

Thank you so much.

Topic		Replies	Views
Seeking advice to incorporate knowledge to LLM Generative AI with Large Language Models ai-discussions	1	47	August 26, 2024
Generative AI with Large Language Models News and Announcements	8	2953	July 3, 2023
Week3 - I have just completed the course, excited to put my knowledge into practice! Generative AI with Large Language Models week-1	2	43	October 15, 2024
Building a chatbot using langchain AI Discussions	12	215	September 4, 2023
Roadmap for Generative AI AI Discussions	7	618	January 27, 2025

Theory into practice: Generative AI lifecycle

Related topics