Agents Evaluation

Hi there,

we have been implementing and learning about agents on AutoGen, LangGraph, CrewAI or just LangChain. We also know how to evaluate RAG but how can we evaluate Agents? Selecting the right tools, answer correctness, etc… Any ideas, resources, trainings to explore? When building a GenAI solutions with agents using tools I think we need a set of different tools or solutions to test components and end to end, don’t you?

Thank you!