Error with Memory Feature Local LLM (LM Studio: phi-4)

Hello DeepLearnig.AI community,

I’m taking the course Multi AI Agent Systems with CrewAI. I’m running this course locally, and I need some help with the class L3: Multi-agent Customer Support Automation. By the way, I successfully ran the class L2: Create Agents to Research and Write an Article locally and the AI agents successfully created the Artificial Intelligence topic, followed by a custom topic of my choice.
This is the chunk of code I’m using to use the model phi-4 running under LM Studio:

import os
os.environ["OPENAI_API_KEY"] = "dummy_key"  # Replace "dummy_key" with any value if LM Studio does not validate the key.

from crewai import LLM
phi4 = LLM(
    model="openai/phi-4@q8_0", 
    base_url="http://192.168.15.184:1234/v1",
    api_key="dummy_key"
)

But, as soon as I start Running the Crew “crew.kickoff(inputs=inputs) “, I get the error message below:

APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body'

I found some discussions on GitHUB, and CrewAI Community, but none applied specifically to my issue. One guy didn’t have enough credits on his OpenAI account, and the other changed the embedded configuration. I followed this last approach using this chunk of code:

import os
os.environ["OPENAI_API_KEY"] = "dummy_key"  # I also tried the value “phi-4@q8_0”

from crewai import LLM
phi4 = LLM(
    model="openai/phi-4@q8_0", 
    base_url="http://192.168.15.184:1234/v1",
    api_key="dummy_key"
)

embedder_config = {
    "provider": "openai",  # Specify the provider
    "config": {
        "api_key": None,  # No API key needed for local setup
        "api_base": "http://192.168.15.184:1234/v1",  # Your local phi-4 endpoint
        "model_name": "phi-4@q8_0",  # Name of your local model
        "model": "openai/phi-4@q8_0",  # Model identifier
    }
}

And then on Creating the Crew I appended the “embedder_config”:

crew = Crew(
  agents=[support_agent, support_quality_assurance_agent],
  tasks=[inquiry_resolution, quality_assurance_review],
  verbose=True,
  memory=True,
  embedder=embedder_config
)

When I executed the code again it ran properly, and the results got the responses back from my local LLM (LM Studio: phi-4). But, by analyzing the results I noticed that the CrewAI Memory Feature didn’t work as expected. Whilst Running the Crew I noticed the following errors message popping up a few times:

APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body'

One of the logs from LM Studio (phi-4):

[LM STUDIO SERVER] [phi-4@q8_0] Generated prediction:  { "id": "chatcmpl-qbwcwxm1bufvpbt820tz7", "object": "chat.completion", "created": 1737236297, "model": "phi-4@q8_0", "choices": [ { "index": 0, "logprobs": null, "finish_reason": "stop", "message": { "role": "assistant", "content": "\n**Final Answer:**\n\nDear Andrew,\n\nThank you for reaching out regarding your interest in setting up a Crew with memory using --trunctated} } ], "usage": { "prompt_tokens": 1049, "completion_tokens": 874, "total_tokens": 1923 }, "system_fingerprint": "phi-4@q8_0" }

Observations:

  1. The error seems related to the rag_storage.py file, where APIStatusError is raised without required arguments (response and body).
  2. Based on the logs, it appears CrewAI might still be trying to use OpenAI embeddings for memory, despite the local LLM setup.
  3. When I check the LM Studio logs, it confirms that predictions (e.g., task responses) are working correctly. However, memory data is not being populated.

Steps Taken:

  • Verified that LM Studio (phi-4) is running and accessible at http://192.168.15.184:1234/v1.
  • Ensured CrewAI is installed and updated (pip install --upgrade crewai crewai-tools).
  • Searched through CrewAI documentation but couldn’t find explicit steps for integrating memory with a local LLM using LM Studio (and phi-4).

Questions:

  1. How can I ensure that CrewAI’s memory functionalities are fully integrated with a local LLM like phi-4?
  2. Is there a specific configuration to set phi-4 as the default memory provider for both short-term and entity memory? Not sure if this makes sense.
  3. Could the APIStatusError be related to how CrewAI handles missing OpenAI credentials, even when a local LLM is being used?
    I’d greatly appreciate any guidance or best practices for configuring CrewAI memory with a local setup. Let me know if additional details or logs are needed to diagnose the issue!

Setup Details:

  • Operating System:
    • Ubuntu WSL 2 (for Jupyter Notebook)
    • Windows 10 (for LM Studio running phi-4)
  • CrewAI Version: 0.95.0
  • Phi-4 Configuration:
    • Running locally on http://192.168.15.184:1234/v1.
    • Confirmed that predictions are generated successfully from phi-4 logs on LM Studio.
    • API url: http://192.168.15.184:1234
    • Model’s API identifier: phi-4@q8_0
    • Supported endpoints (OpenAI-like)
      GET: /v1/models
      POST v1/chat/completions
      POST v1/completions
      POST v1/embeddings

Thanks a lot,

Lúcio

@nuneslucioliveira Hey, I just fixed that issue but for ollama(local LLM), still for LM studio the implementation should be the same as I described over there, hope it helps :slight_smile:

github/crewAIInc/crewAI/issues/1669#issuecomment-2599838824

edit: I can’t include link in a post so just adjust accordingly

I finally managed to fix the issue with my buddy o1, and here is exactly how I got it working with the memory feature and a local LLM:

Observations

  • The error APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body' would show up whenever CrewAI attempted to do a memory operation (like short_term or entity indexing).
  • Even though the text-generation calls to LM Studio (phi-4) worked, the memory operations needed “embeddings,” which triggered the incomplete/invalid JSON response—leading to that error.

Root Cause

CrewAI tries to call an OpenAI‐style /v1/embeddings endpoint for memory. By default, if you specify a single local LLM as both your “completions” and “embeddings” provider, and that local LLM (LM Studio in this case) doesn’t fully support embeddings or returns a partial JSON, you get that APIStatusError... exception.


How We Fixed It

  1. Set Up a Proper Embeddings Endpoint

    • You need to ensure the model that CrewAI uses for “embedder” is actually returning embeddings via an OpenAI‐style route. This can be done by either:
      • Using another local server (e.g., Ollama) with an embeddings‐capable model, or
      • I installed ollama in the same Ubuntu WSL that I was running my Jupyter Lab from, and pulled nomic-embed-text.
  2. Use a Separate embedder_config

    • Instead of pointing the embedder to phi-4’s chat completion endpoint, I switched to a config that references my ollama nomic-embed-text model.
    • For instance:
      embedder_config = {
          "provider": "ollama",  # Tells CrewAI to treat it as an OpenAI-like endpoint
          "config": {
              "api_key": "dummy_key",               
              "api_base": "http://localhost:11411/v1",  
              "model": "nomic-embed-text"           
          }
      }
      
  3. Pass That Embedder into the Crew

    • Make sure that when you create the Crew, you do:
      crew = Crew(
          agents=[support_agent, support_quality_assurance_agent],
          tasks=[inquiry_resolution, quality_assurance_review],
          verbose=True,
          memory=True,
          embedder=embedder_config  # Here's the separate embedder config
      )
      
    • This ensures generation calls still go to phi4 (LM Studio) while memory/embeddings calls go to the nomic-embed-text (ollama) embeddings endpoint.
  4. Watch the Logs

    • Verify that the short_term and entities searches are no longer throwing APIStatusError.

That’s what resolved the issue on my side. Hope this helps you (and anyone else) integrate CrewAI memory with a local LLM like phi-4! :slight_smile:

All the best,
Lúcio