Another bug - APIError: Error code: 422 "Tokens must be <= 8193"

I’m trying to take this course but I ran into another bug, I think.

First, I got stuck on this error:

. . .
InvalidRequestError: Error code: 400 

See this other post: Is this course broken?.

This first error is overcome replacing the definition of the client.chat.completions.create() function with the following:

output = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": world_prompt}
    ],
    tools=[],
    tool_choice="auto"

Note that two new parameters have been added, as tools=[],tool_choice="auto" .

But now, in lesson L2_Interactive AI Applications I get this other error:

. . .
APIError: Error code: 422 - {"message": "Input validation error: `inputs` tokens + `max_new_tokens` must be <= 8193. Given: 6376 `inputs` tokens and 2048 `max_new_tokens`", "type_": "invalid_request_error", "param": null, "code": null}

It seems to me that the limit on the amount of tokens that can be passed to the AI has changed.

Does anyone have an idea how to fix it?
Thanks

Hello,
You are getting the token count error because the 70-B Llama 3 chat model on Together has an 8192-token context window (prompt + reply) but our prompt alone is nearly 6376 tokens. Because the Together Python client injects a default max_tokens=2048 (or max_new_tokens) when we don’t se the maximum token explicitly, the server sees:
6376 input + 2048 requested = 8424 > 8192
We can resolve the 422 error by explicitly setting the max_tokens. I set it to 256 and it is more than enough for our use.
(Note: The token counts are mentioned in the error)
Hope this helps!

3 Likes

Thank you RakshaC, that explanation worked perfect. For others to reference, I am posting the updated code that worked for me. This is for the video “Interactive AI Applications.” It is the third code box underneath “Generating an Initial Start.”

model_output = client.chat.completions.create(
model=“meta-llama/Llama-3-70b-chat-hf”,
temperature=1.0,
max_tokens = 256,
messages=[
{“role”: “system”, “content”: system_prompt},
{“role”: “user”, “content”: world_info + ‘\nYour Start:’},
],
tools=,
tool_choice=“auto”)

1 Like

Thank you for reporting! Please, take into account that this issue is related to updates in the Together server!

As @adam616 just shared, we need to add explicitly extra parameters when setting the model output.

Here, I just ran all Lesson 1 notebook with this code updated and all works fine!

In the upcoming weeks, I’ll be updating the notebooks and that will be reflected in the platform!

Thank you again for reporting this!

Thank you for your support! Indeed, it is a matter of the the max_tokens and the model!

I’m using this updated code:

model_output = client.chat.completions.create(

    #model="meta-llama/Llama-3-70b-chat-hf",
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    temperature=1.0,

    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": world_info + '\nYour Start:'}
    ],

max_tokens=1800 # update

)

Following recommendation from documentation

Notebooks are being reviewed and will be updated with these fixes in the platform!