Flaw in event planning example

The course is really inspiring and clearly taught.

Sill it is not really creating trust in the results, when no one seems to notice that the request was to plan an event for 500 participants, but the preferred venue only has a capacity of 200.

The capacity is a numeric value, no fuzzy, ambiguous text input. So it should be easy for the model to check this aspect in the result. But this does not seem to happen. The marketing campaign happily continues with trying to attract 500 attendees. Just imagine the day of the event, when 300 of them will not be able to attend….

Not sure where this could be stronger emphasised to prevent such kinds of invalid results. Task? Guardrails?

Maybe this is just a problem of this particular example and can be mitigated with proper implementation. The framework seems to be very capable. But this oversight in the course rather creates reluctance than desire to use it.

I completely understand your concern. You’ve highlighted a key limitation that often comes up when using AI for planning or decision-making tasks: the model can generate plausible outputs without fully validating constraints like numeric capacities or resource limits.

In this case, the issue isn’t just about understanding the numbers — it’s about enforcing task-specific constraints. There are a few ways this can be addressed:

  1. Task definition and prompt engineering: Explicitly instruct the model to check for hard constraints (e.g., venue capacity must be ≥ number of participants). Sometimes the model will follow it better if the requirement is stated clearly and repeatedly.

  2. Post-processing/validation: After the AI generates a plan, implement a check in code or logic to verify critical constraints. This ensures that even if the model misses something, the system won’t accept invalid outputs.

  3. Guardrails or tools/frameworks: Some frameworks allow you to define rules that the AI cannot violate. For numeric or structured constraints like this, these guardrails can prevent invalid suggestions before they are presented to the user.

I think your point about trust is crucial — seeing these oversights makes it harder to rely on AI for real-world tasks. The good news is that with proper implementation — combining strong prompts, validation logic, and guardrails — these issues can be largely mitigated.

The framework itself is capable, but the course could emphasize the importance of constraint checks more strongly to avoid giving learners a false sense of reliability.

— Steve

Thanks for the comprehensive answer, @SteveArthur.

The incorrect result came a bit as a surprise for me, because the CrewAI framework/SDK actually has a nicely formalized way of defining the individual parts of the process (Task, Agent,…).

Specifically the {placeholder} notation, paired with the structured input dict that is used as argument for the kickof() command (and that again has these placeholders as keys) should make enforcing correctness a lot easier. Compare this with the case, that the crew first needs to figure out from natural language, what to do.

When I wrote the above message, I had not yet finished the chapter with the financial strategy example. This was the real disappointment for me. Despite a very detailed description of the roles and tasks of the crew, the result was not more than a generic collection of unspecific stock marked advice and general company data. And not a trading strategy.

And how could it be? An agent that is told to “constantly monitor the market and use machine learning to gain insights that allows to develop a strategy” should just have refused to give a result from a one-off query to yahoo, because this is quite the opposite of constant monitoring. And worse: even though the query indicated a medium risk tolerance, the answer listed option trading as a possible strategy, which is commonly perceived as one of the riskiest things you can do with your money.

But LLMs are built to please us and if they do not have a decent answer they will invent one. I am not sure whether the LLM or the framework or their interaction are to blame for this poor result. I thought it was the value proposition of CrewAI to make setting up and running Agentic Workflows smooth and realiable. At least I liked its approach best from what I have seen on this platform so far. If only the results would be equally convincing ….