Planning with code execution limits

Hi deeplearners , may you share with us your insights about the limitations of planning with code execution ?

For example, the use cases where it is not suitable.

Hi dissamakazoule,

I would say that every use case poses risks. The authors of the code execution paper (https://arxiv.org/pdf/2402.01030) point to an article about scientific LLM agents, the risks these pose, and safeguarding measures that can be taken (https://arxiv.org/pdf/2402.04247v4). As indicated in a previous topic, the code execution paper focuses on use cases regarding model training and other complex software ( Code as plan - practical in real world scenarios? - #2 by reinoudbosch ). For other use cases, the paper notes practical limitations to be solved through some type of reinforcement learning. I would add that one should always have a human-in-the-loop with final responsibility. So, while the idea of code execution is powerful, it always requires responsible safeguarding.