Is there a way to receive the answer after the max token?

Hello, I am currently using the Claude 3 model.
Claude3’s max token is 4096 tokens.

In other words, Claude 3 can produce a maximum length of 4096 tokens when outputting once.

I asked Claude3 to translate a huge amount of documents, but because Claude3’s max token is 4096, the model is not outputting the entire length.

How can I continue to receive answers after Max Token?

I think you cannot.

1 Like

Isn’t the answer RAG?

You can’t pass a lot of data to OpenAI/Gemini/… → You save data in vector DB and use langchain/llamaindex to query your data.

Isn’t this the reason RAG exists?

Thanks in advance.

But how?
Is that possible?

How should I save it in vector db to get the next response of Max Token?

“I asked Claude3 to translate a huge amount of documents, but because Claude3’s max token is 4096, the model is not outputting the entire length.”

What you said is that you can save the response output from this output to Vector DB and continue to receive the next value.

You have to upload the huge amount of documents as vectors in a vector db like Pinecone. Then build a RAG pipeline (some fancy name for vector searching), and you should overcome the issue with not being able to pass all the data you have to the model.