Can bolting LLMs onto robots improve the robots ' performance? "Can LLMs Make Robots Smarter?" by Samuel Greengard

Can LLMs Make Robots Smarter?

Large language models may be used to do a lot of the planning that robots require, from within the robot.

https://dl.acm.org/doi/10.1145/3701227

Appears in: “Communications of the ACM”, Volume 68, Issue 2

We read:

The ultimate goal is to develop “agentic computing” systems that use LLMs to power robots through complex scenarios that require numerous steps. Yet, developing these more advanced robots is fraught with obstacles. For one thing, GPT and other models lack grounding—the context required to address real-world situations. For another, AI is subject to errors and fabrications, also known as hallucinations. This could lead to unexpected and even disastrous outcomes—including unintentionally injuring or killing humans.

And also:

Singh’s research demonstrated that an LLM could nudge robots to better performance. The method—ProgPrompt—relies on a hybrid approach that involves direct interaction with an LLM along with using ChatGPT to write code. The approach led to the desired action as much as 75% of the time.a Yet, problems persisted. These mostly centered on the robot’s inability to understand commands, and it would sometimes become confused and stop altogether. “The results were far from perfect, but this is better than what we can achieve with conventional programming,” she said.

About this particularly adventurous approach, see:

Furthermore:

There’s also a more basic question about whether it’s wise or practical to use robotics in certain situations, Hundt said. “Just because we can use a robot for a given task—and it can do that thing effectively—doesn’t mean we should use it. Sometimes, the conventional way of doing things is better,” For example, in Japan, care facilities that experimented with robots over the course of several weeks quickly discovered that practitioners suddenly had to tend to both robots and patients. As a result, human workload increased and within a few days, workers typically stopped using the robots.b

In fact, some robotics experts are skeptical LLMs will ever serve as effective robot “brains.” “LLMs are new and shiny and not deployed at scale anywhere,” said Rodney Brooks, Panasonic Professor of Robotics (emeritus) for the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab (MIT CSAIL), and CTO of robotics firm Robust AI. “Language is not related to the hard problems of robotics in any way.”

1 Like

I also read:

On the one hand, could robot learn from mistakes? On the other, was there good environment for learning?

1 Like

A good question. Medical/Care environments must be one of the most difficult environments that there are, being at the intersection of “extremely arbitrary” and “demanding extreme safety”. Staff will also not be ready to put up with much, if the machine doesn’t “fit in” quickly it will be relegated to the cellar soon. In fact, interactions with non-specialist staff working under overload and constant task switching stress are probably the most difficult to get right (one reason why Therac-25 (which didn’t even have AI) killed people was, apart from the sophomoric and faulty system design, that staff didn’t know what to with with cryptic error codes except hit RETURN, and I always get nervous if the X-Ray machine of my dentist does exactly that and then refuses to explain, but I digress)

Quite apart from task-specific machinery (surgical robots), “Moxi” seems to best you can get for now?

There are also the nearly-industrial “trainable” (I think one “shows them” what they are supposed to do) robots from the now-sadly-defunct “Rethink Robotics” (co-founded by Brooks in 2008), but they are not ready for “care environments”:

Also, none of these use LLMs.

As a follow-up, here is an article where the authors are rather confident that “LLM + Robots” is a winning formula (I detest stock art):

Embodied Artificial Intelligence (EAI) involves embedding artificial intelligence into tangible entities, such as robots, equipping them with the capacity to perceive, learn from, and engage dynamically with their surroundings. In this article we delve into the key tradeoffs of building foundation models for EAI systems.

Two approaches are compared:

  • Taking a pretrained large model, fine-tuning it, and then allowing in-context learning
  • Taking a model that has been trained to meta-learn (how?), then having it perform “general-purpose in-context learning”

In conclusion, we believe that learning from the environment is the essential feature for EAI systems and thus the meta-training + GPICL approach is promising for building EAI foundation models due to its capabilities of providing better long-term adaptability and generalization. Although currently this approach is facing significant challenges in computing and memory usage, we believe that innovations such as Infini-attention and StreamingLLM will soon making this approach viable for real-time, resource-constrained environments.