PDF url exceeding context leading to LLM error
In C1M3_Lab_1_deep_research_p1, the create_research_plan task for the Research Planner agent is causing issues.
ERROR:root:Context window exceeded: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 3811560 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
ERROR:root:OpenAI API call failed: LLM context length exceeded. Original error: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 3811560 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
Consider using a smaller input or implementing a text splitting strategy.
Looking at the output below, it looks like the tool to read website content is trying to access a PDF.
Causing the context window to be exceeded immediately after.
Is this a limitation of this tool? If so, how to fix? If not, what am I doing wrong?
Thanks
Did you try a different query to see if it worked? It’s possible it just exceeded the context window. If you were running it locally, you could switch to an LLM with a larger context window.
Yes other queries worked.
The LLM error arises only when trying to read PDFs behind urls.
That’s the only way I can reproduce the error: ask a question where I know relevant information can be found from a PDF online e.g. input as “FED policy and the affect on the market versus economy”.
It tried to read “website_url”: “https://www.bis.org/publ/arpdf/ar2025e.pdf”
Status: Executing Task…
├──
Using EXASearchTool (1)
├──
Used EXASearchTool (1)
├──
Used EXASearchTool (2)
├──
Used EXASearchTool (3)
├──
Used Read website content (1)
└──
LLM Failed
Looks like scrape tool parsing PDFs fills up the context immediately.
Definitely looks like the context gets filled up in this case. @magdalena.bouza what do you say? Is this something that can be filled as an issue on the crewAI github?
Hey @Fabio-RibeiroB and @lukmanaj. Yeah, I’ve seen this happen before. Sometimes the tools try to scrape more info than what the gpt-4o-mini context allows and you get this error. Usually just re-running the query helps. Unfortunately, don’t think there’s much to be done as there’s no way to limit the response of the tools (I’ve already looked into it)
Is there a tool that can search within the pdf behind the url to load it properly? No matter how many times I re-run this crew, the relevant links found include a pdf. And rightly so, they contain relevant info for the research.