PDF urls breaking research task by exceeding context window

PDF url exceeding context leading to LLM error

In C1M3_Lab_1_deep_research_p1, the create_research_plan task for the Research Planner agent is causing issues.

ERROR:root:Context window exceeded: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 3811560 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
ERROR:root:OpenAI API call failed: LLM context length exceeded. Original error: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 3811560 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
Consider using a smaller input or implementing a text splitting strategy.

Looking at the output below, it looks like the tool to read website content is trying to access a PDF.

Causing the context window to be exceeded immediately after.

Is this a limitation of this tool? If so, how to fix? If not, what am I doing wrong?

Thanks

Did you try a different query to see if it worked? It’s possible it just exceeded the context window. If you were running it locally, you could switch to an LLM with a larger context window.

Yes other queries worked.

The LLM error arises only when trying to read PDFs behind urls.
That’s the only way I can reproduce the error: ask a question where I know relevant information can be found from a PDF online e.g. input as “FED policy and the affect on the market versus economy”.

It tried to read “website_url”: “https://www.bis.org/publ/arpdf/ar2025e.pdf”
Status: Executing Task…
├── :wrench: Using EXASearchTool (1)
├── :wrench: Used EXASearchTool (1)
├── :wrench: Used EXASearchTool (2)
├── :wrench: Used EXASearchTool (3)
├── :wrench: Used Read website content (1)
└── :cross_mark: LLM Failed

Looks like scrape tool parsing PDFs fills up the context immediately.

Definitely looks like the context gets filled up in this case. @magdalena.bouza what do you say? Is this something that can be filled as an issue on the crewAI github?

Hey @Fabio-RibeiroB and @lukmanaj. Yeah, I’ve seen this happen before. Sometimes the tools try to scrape more info than what the gpt-4o-mini context allows and you get this error. Usually just re-running the query helps. Unfortunately, don’t think there’s much to be done as there’s no way to limit the response of the tools (I’ve already looked into it)

Is there a tool that can search within the pdf behind the url to load it properly? No matter how many times I re-run this crew, the relevant links found include a pdf. And rightly so, they contain relevant info for the research.