Running Local LLM with Ollama

Is there a way to do Module 1 for Agentic AI course with local downloaded LLM instead of using OpenAI or any other paid provider’s models for lab of building deep research agent for Agentic AI course. I tried running using Ollama but getting below error even though local llm is running

File “/usr/local/lib/python3.11/site-packages/aisuite/providers/ollama_provider.py”, line 48, in chat_completions_create
raise LLMError(f"Connection failed: {self._CONNECT_ERROR_MESSAGE}")
aisuite.provider.LLMError: Connection failed: Ollama is likely not running. Start Ollama by running ollama serve on your host.

please share the output of

curl http://localhost:11434/api/ps

Have you tried this?

Instead of referencing the OpenAI link, you use your localhost link.

As you can see in the docs, aisuite supports huggingface ollama:

Based on the docs, if you want to use ollama locally, you should start from ollama itself rather than going through aisuite. You can then use the ollama docs

1 Like

I am getting this when I curl on localhost,
{“models”:}

but I can see this req coming to ollama

Empty models array means that no models are running (i.e. loaded into memory) . Please use ollama pull to download a model and ollama run to run the model.

I tried pulling and then running the model. I can chat with it in cmd after running it, but then if I do curl, I still see empty Model list.

These are the steps on linux (jq is for pretty printing json):

# Install ollama
# this also runs ollama service once install is complete
# run `ollama serve` on a seperate terminal if you stop the service.
curl -fsSL https://ollama.com/install.sh | sh
# output skipped

# check version
$ ollama --version 
# output: ollama version is 0.12.5

# download model
$ ollama pull gemma3:270m
# output skipped

# check before loading model
$ curl http://localhost:11434/api/ps
# output: {"models":[]}

# load model into memory
$ curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:270m"
}'
# output: {"model":"gemma3:270m","created_at":"2025-10-15T18:15:10.920382548Z","response":"","done":true,"done_reason":"load"}
# check loaded model status
$ curl http://localhost:11434/api/ps | jq

Output:

{
  "models": [
    {
      "name": "gemma3:270m",
      "model": "gemma3:270m",
      "size": 550094976,
      "digest": "e7d36fb2c3b3293cfe56d55889867a064b3a2b22e98335f2e6e8a387e081d6be",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "gemma3",
        "families": [
          "gemma3"
        ],
        "parameter_size": "268.10M",
        "quantization_level": "Q8_0"
      },
      "expires_at": "2025-10-15T23:50:10.920793022+05:30",
      "size_vram": 550094976,
      "context_length": 4096
    }
  ]
}

# Use model
$ curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:270m",
  "prompt": "What color is the sky at different times of the day? Respond using JSON",
  "format": "json",
  "stream": false
}' | jq

Output:

{
  "model": "gemma3:270m",
  "created_at": "2025-10-15T18:17:14.08242417Z",
  "response": "{\"color\": \"blue\", \"description\": \"The sky at different times of the day is typically blue.\"\n}",
  "done": true,
  "done_reason": "stop",
  "context": [
    105,
    2364,
    107,
    3689,
    2258,
    563,
    506,
    7217,
    657,
    1607,
    2782,
    529,
    506,
    1719,
    236881,
    58025,
    1699,
    10434,
    106,
    107,
    105,
    4368,
    107,
    14937,
    3001,
    1083,
    623,
    9503,
    827,
    623,
    7777,
    1083,
    623,
    818,
    7217,
    657,
    1607,
    2782,
    529,
    506,
    1719,
    563,
    11082,
    3730,
    1781,
    107,
    236783
  ],
  "total_duration": 730446542,
  "load_duration": 139217711,
  "prompt_eval_count": 24,
  "prompt_eval_duration": 27352765,
  "eval_count": 25,
  "eval_duration": 208710680
}
# Unload model
$ curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:270m",
  "keep_alive": 0
}'
# Output: {"model":"gemma3:270m","created_at":"2025-10-15T18:19:03.251184622Z","response":"","done":true,"done_reason":"unload"}

# check model in memory
$ curl http://localhost:11434/api/ps  
# Output: {"models":[]}

You have Ollama server running. It just does not have any model loaded. If you make an inference to a model, and run curl ``http://localhost:11434/api/ps while it is loaded, you should see something there.

I managed to have ollama working with aisuite by just setting the model as “ollama:gpt-oss” (or whatever model you prefer) i.e:

model="ollama:gpt-oss"
response = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": prompt}],
    temperature=0,
)

My only problem is that I hit a time out error, and I do not know yet how to change that waiting parameter before it times out.

This discussion about timeout happens on this thread:

Has anyone used the Ollama served model with the tool calling in the aisuite as shown in M3? It seems that it doesn’t identify the tool that gets passed somehow…

Yes, this is a known limitation when using Ollama with AISuite in the Module 3 tool-calling workflow. Ollama models don’t natively support the full OpenAI-style function-calling schema, so AISuite can send the tool definitions, but the model often won’t recognize or trigger them unless it’s specifically fine-tuned or configured to follow that format.

A few things you can test:

  1. Enable --verbose mode in Ollama and confirm whether the raw prompt actually contains the tool schema. Sometimes AISuite suppresses parts depending on provider config.

  2. Use a model that is known to work with tool calling, like llama3.1 or mistral-nemo, and include an explicit system prompt like:
    “When a matching tool is appropriate, respond strictly using the JSON function_call format provided.”

  3. Check AISuite provider settings — some versions require supports_tool_calling=True to be set manually for Ollama.

  4. Wrap the tool schema into the prompt manually as a temporary workaround. Many users report this makes the LLM correctly call the tool even when native function calling isn’t supported.

This is an ecosystem gap rather than something you’re doing wrong. If you want, share your provider config and tool schema and I can help debug the exact reason the call isn’t being triggered.