404 in L7 code

I get the error of: NotFoundError: Error code: 404 - {‘detail’: ‘Not Found’} when running Jupyter Notebook code under the heading: Llama Stack Inference. Then I get same error in all code cells below that. It seems like the stack api is not accessible. But I am not sure. Can someone resolve this and solve this error?

It seems like there is error in LLAMA_STACK_API_TOGETHER_URL. Is API updated?

Well after waiting 24 hours for a reply and having no help, I decided to resolve it myself. The problem is with this code:

client = LlamaStackClient(
        base_url=LLAMA_STACK_API_TOGETHER_URL,
    )

Which looks for llama stack server at: https://llama-stack.together.ai
but the above URL is dead or the API token has limit that finishes with end of lesson 6.

Here is how I resolved the issue by running llama stack server locally using WSL and docker. I used my own API after creating account on https://together.ai

Below are the steps which I am sharing in case if you are also curious in running stacks:

Installing and Using Llama Stack

Step 1: Download the Git Repository

Open WSL (Windows Subsystem for Linux). I am using Debian Linux for this setup.

To begin setting up Llama Stack, clone the official repository using the following command:

git clone https://github.com/meta-llama/llama-stack.git

Step 2: Build Llama Stack

Navigate to the cloned repository and update your package lists:

cd llama-stack/
sudo apt update

Step 3: Check Python Version and Set Up Virtual Environment

Ensure you have Python version 3.10 or higher:

python --version

Install venv if it is not already installed:

sudo apt install python3.11-venv

On my WSL setup, Python version 3.11 is used. Adjust accordingly for your system.

Create a virtual environment:

python3 -m venv llama_stack_env

Activate the virtual environment:

source llama_stack_env/bin/activate

Step 4: Install Llama Stack

Install Llama Stack using:

pip install llama-stack

Alternatively, install it from the Git repository:

pip install -e .

I installed it from PyPI. Verify the installation with:

llama --help

Step 5: Run Llama Stack

Ensure Docker is installed and running. Then, build the Llama Stack container:

llama stack build --template together --image-type docker

If you encounter the following error:

curl: command not found

Install the required package:

sudo apt install curl

Run the Llama Stack again:

llama stack build --template together --image-type container

This process may take some time. Then, export the necessary environment variables:

export LLAMA_STACK_PORT=5001
export TOGETHER_API_KEY=<Your_API_Key>

Obtain your TOGETHER_API_KEY by creating an account on together.ai. They provide an API key with a $1.00 credit.

Now, run the Docker container:

docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT llamastack/distribution-together --port $LLAMA_STACK_PORT --env TOGETHER_API_KEY=$TOGETHER_API_KEY

The system will download different Llama models. The server will start and run on localhost:5001:

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)

Step 6: Check Inference on Windows

On Windows, open cmd and create a virtual environment:

mkdir llama-stack
cd llama-stack
python -m venv llama-stack-client-env
llama-stack-client-env\Scripts\activate

Install the client package:

pip install llama-stack-client

Configure the client:

llama-stack-client configure --endpoint http://localhost:5001

The above command will prompt for the TOGETHER_API_KEY. Create an account on together.ai to obtain an API key worth $1.00.

Step 7: Verify Your Inference Code

In the same command prompt, create and run verify.py to check the available models. Create the script inside the llama-stack folder:

python verify.py

Contents of verify.py:

from llama_stack_client import LlamaStackClient

# Initialize the client
client = LlamaStackClient(base_url="http://localhost:5001")
models = client.models.list()
print("Available models:")
for model in models:
    print(f"- {model.identifier}")

Step 8: Run Inference Code

Create an inference script named inference.py in the llama-stack folder:

from llama_stack_client import LlamaStackClient
from llama_stack_client.types import UserMessage

# Initialize the client
client = LlamaStackClient(base_url="http://localhost:5001")

# Define the user message
user_message = UserMessage(
    content="Hello, can you write a short poem about the stars?",
    role="user",
)

# Perform chat completion
response = client.inference.chat_completion(
    messages=[user_message],
    model_id="meta-llama/Llama-3.2-3B-Instruct",  # Ensure this model ID is available on your server
    stream=False,
)

# Print the response
if response.completion_message:
    print(response.completion_message.content)
else:
    print("No completion message received.")

Run it using:

python inference.py

Sample Output:

Here is a short poem about the stars:

Twinkling diamonds in the night,
A celestial show, a wondrous sight.
The stars up high, a twinkling sea,
A million points of light, for you and me.

Their gentle sparkle, a guiding light,
A beacon in the darkness, shining bright.
In the vast expanse, of space and time,
The stars remain, a constant rhyme.

Their beauty is, a gift to see,
A reminder of, the mystery.
Of the universe, and all its might,
The stars shine on, through the dark of night.

Conclusion

You have successfully installed and configured Llama Stack. You can now use it for inference and explore its capabilities further. Happy coding!