Question regarding max_new_tokens parameter

Hi all,

I have a question regarding the parameter max_new_tokens. I understand that when you set it up too shortly the output sentence will be cut, however, when I tried it out the output sentence was indeed shorter, however it was never cut.

I would have expected to see half sentences if I set it up this parameter too low, but I am seeing shorter sentences that are correctly finished (for example, with a dot). It appears to me, it is not cutting the sentence but doing so in a smart way. How does this work?

For example,I would have expected this kind of sentences with a max_new_tokens too low: “he was a” or “the airplane flew from”, but instead I see “he was happy.”, “the airplane flew over the city.”

Can anyone help me understand what is going on?

Thanks in advance!

May be I can help.

I think it has to do with the final fine-tuning of the model.

I have worked with many models, big and small, that are instruction-tuned, and some others that are base models, but all smaller than GPT or Claude, for instance.

When the model is instruction-tuned, or base, I usually see incomplete sentences at the end. Rarely I see a model that gives complete sentences. Then when I use models tuned for the ‘chat’ task, I see a mix of results, some can complete, others leave the sentences incomplete.

I’d say that the only models that have given me consistent complete sentences are GPT models and Claude.

And I asked myself the same question: Why is this happening? My only answer is: there’s got to be an additional process at the end of GPT and Claude that makes sure that the output ends in a complete sentence. May be they exclude the last incomplete sentence? or may be they do an additional process in the last sentence to make sure it comes complete? something has to be happening at the end.

I guess I’ll never know. But if I wanted to produce the same result, i.e. always produce a last complete sentence, I would certainly add something at the end of the output to make sure that this is the case.


I dont know for sure either @Juan_Olano but I think that appart from max_new_tokens, you have max_length and a few other parameters there that interfere with each other and need not be canceling each others effect in order to produce the right asked output.

I completely agree with your line of thought. It has to be something else, maybe something as simple as just trimming the last sentence if it was incomplete.