Running Local LLM with Ollama

priyank.vora257752 · October 12, 2025, 11:45pm

Is there a way to do Module 1 for Agentic AI course with local downloaded LLM instead of using OpenAI or any other paid provider’s models for lab of building deep research agent for Agentic AI course. I tried running using Ollama but getting below error even though local llm is running

File “/usr/local/lib/python3.11/site-packages/aisuite/providers/ollama_provider.py”, line 48, in chat_completions_create
raise LLMError(f"Connection failed: {self._CONNECT_ERROR_MESSAGE}")
aisuite.provider.LLMError: Connection failed: Ollama is likely not running. Start Ollama by running ollama serve on your host.

balaji.ambresh · October 13, 2025, 6:17am

please share the output of

curl http://localhost:11434/api/ps

Basillicus · October 13, 2025, 7:49am

Have you tried this?

carlosrl · October 13, 2025, 12:51pm

Instead of referencing the OpenAI link, you use your localhost link.

reinoudbosch · October 13, 2025, 1:39pm

As you can see in the docs, aisuite supports huggingface ollama:

Based on the docs, if you want to use ollama locally, you should start from ollama itself rather than going through aisuite. You can then use the ollama docs

priyank.vora257752 · October 14, 2025, 12:45am

I am getting this when I curl on localhost,
{“models”:}

but I can see this req coming to ollama

balaji.ambresh · October 14, 2025, 4:32am

Empty models array means that no models are running (i.e. loaded into memory) . Please use ollama pull to download a model and ollama run to run the model.

priyank.vora257752 · October 14, 2025, 6:56pm

I tried pulling and then running the model. I can chat with it in cmd after running it, but then if I do curl, I still see empty Model list.

balaji.ambresh · October 15, 2025, 6:22pm

These are the steps on linux (jq is for pretty printing json):

# Install ollama
# this also runs ollama service once install is complete
# run `ollama serve` on a seperate terminal if you stop the service.
curl -fsSL https://ollama.com/install.sh | sh
# output skipped

# check version
$ ollama --version 
# output: ollama version is 0.12.5

# download model
$ ollama pull gemma3:270m
# output skipped

# check before loading model
$ curl http://localhost:11434/api/ps
# output: {"models":[]}

# load model into memory
$ curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:270m"
}'
# output: {"model":"gemma3:270m","created_at":"2025-10-15T18:15:10.920382548Z","response":"","done":true,"done_reason":"load"}

# check loaded model status
$ curl http://localhost:11434/api/ps | jq

Output:

{
  "models": [
    {
      "name": "gemma3:270m",
      "model": "gemma3:270m",
      "size": 550094976,
      "digest": "e7d36fb2c3b3293cfe56d55889867a064b3a2b22e98335f2e6e8a387e081d6be",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "gemma3",
        "families": [
          "gemma3"
        ],
        "parameter_size": "268.10M",
        "quantization_level": "Q8_0"
      },
      "expires_at": "2025-10-15T23:50:10.920793022+05:30",
      "size_vram": 550094976,
      "context_length": 4096
    }
  ]
}

# Use model
$ curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:270m",
  "prompt": "What color is the sky at different times of the day? Respond using JSON",
  "format": "json",
  "stream": false
}' | jq

Output:

{
  "model": "gemma3:270m",
  "created_at": "2025-10-15T18:17:14.08242417Z",
  "response": "{\"color\": \"blue\", \"description\": \"The sky at different times of the day is typically blue.\"\n}",
  "done": true,
  "done_reason": "stop",
  "context": [
    105,
    2364,
    107,
    3689,
    2258,
    563,
    506,
    7217,
    657,
    1607,
    2782,
    529,
    506,
    1719,
    236881,
    58025,
    1699,
    10434,
    106,
    107,
    105,
    4368,
    107,
    14937,
    3001,
    1083,
    623,
    9503,
    827,
    623,
    7777,
    1083,
    623,
    818,
    7217,
    657,
    1607,
    2782,
    529,
    506,
    1719,
    563,
    11082,
    3730,
    1781,
    107,
    236783
  ],
  "total_duration": 730446542,
  "load_duration": 139217711,
  "prompt_eval_count": 24,
  "prompt_eval_duration": 27352765,
  "eval_count": 25,
  "eval_duration": 208710680
}

# Unload model
$ curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:270m",
  "keep_alive": 0
}'
# Output: {"model":"gemma3:270m","created_at":"2025-10-15T18:19:03.251184622Z","response":"","done":true,"done_reason":"unload"}

# check model in memory
$ curl http://localhost:11434/api/ps  
# Output: {"models":[]}

Basillicus · October 16, 2025, 1:51pm

You have Ollama server running. It just does not have any model loaded. If you make an inference to a model, and run curl ``http://localhost:11434/api/ps while it is loaded, you should see something there.

I managed to have ollama working with aisuite by just setting the model as “ollama:gpt-oss” (or whatever model you prefer) i.e:

model="ollama:gpt-oss"
response = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": prompt}],
    temperature=0,
)

My only problem is that I hit a time out error, and I do not know yet how to change that waiting parameter before it times out.

balaji.ambresh · November 2, 2025, 6:30am

This discussion about timeout happens on this thread:

Bin_Liu · November 25, 2025, 5:12pm

Has anyone used the Ollama served model with the tool calling in the aisuite as shown in M3? It seems that it doesn’t identify the tool that gets passed somehow…

SteveArthur · November 26, 2025, 4:49pm

Yes, this is a known limitation when using Ollama with AISuite in the Module 3 tool-calling workflow. Ollama models don’t natively support the full OpenAI-style function-calling schema, so AISuite can send the tool definitions, but the model often won’t recognize or trigger them unless it’s specifically fine-tuned or configured to follow that format.

A few things you can test:

Enable --verbose mode in Ollama and confirm whether the raw prompt actually contains the tool schema. Sometimes AISuite suppresses parts depending on provider config.
Use a model that is known to work with tool calling, like llama3.1 or mistral-nemo, and include an explicit system prompt like:
“When a matching tool is appropriate, respond strictly using the JSON function_call format provided.”
Check AISuite provider settings — some versions require supports_tool_calling=True to be set manually for Ollama.
Wrap the tool schema into the prompt manually as a temporary workaround. Many users report this makes the LLM correctly call the tool even when native function calling isn’t supported.

This is an ecosystem gap rather than something you’re doing wrong. If you want, share your provider config and tool schema and I can help debug the exact reason the call isn’t being triggered.

Topic		Replies	Views
Making RAG work with locally running LLM Retrieval Augmented Generation week-module-1 , coursera-platform	2	252	November 4, 2025
Other models than OpenAI Building Your Own Database Agent ai-discussions , dl-ai-learning-platform	0	26	May 27, 2025
Change Timeout parameter in aisuite with ollama? Agentic AI week-module-1 , course	4	185	October 17, 2025
How to run code of the course in local Prompt Engineering with Llama 2	3	371	May 6, 2024
Agents and tools with local llms Functions, Tools and Agents with LangChain	2	550	May 17, 2024

Running Local LLM with Ollama

Related topics