Hello, could someone provide more information regarding the maximum input and output size of the Flan-T5 models? While reading the paper, I noticed it was trained on 1024 input length and 256 output length, but I also saw conflicting information. Can someone please clarify? Thank you.
Hi @Ali_issa , welcome to the community!
As token limitations in the input/output are inherent limitations of the LLM models, I would assume this depends on the model you choose. As you can see in Huggingface, there are several versions of Flan-T5, which may lead to different input/output token limitations.
Does it make sense?
Hello @carloshvp , thank u for your answer. But i was doing some research about Flan-T5-large. And i am not sure about the specific input and output lengths. So if you khow how we can obtain the length or if u already familiar with the input/output size of Flan-T5-large .Feel free to share the numbers with me
Hi @Ali_issa ,
as you know FLAN-T5 is an instruction fine-tuned variant of T5. Looking at the paper behind T5 (here), it looks like the used a maximum sequence length of 512 tokens, which means, anything beyond that will probably give bad results. There is however some literature about extending the context size, but that is another topic.
Probably an easier way to find the window size is to look at the d_model parameter in the T5 model documentation at Hugging Face, where you can also see the same 512 value
I am not sure however that the d_model is the parameter to look at. In previous versions of hugging face documentation, there was a very useful and clear parameter called n_positions, which is exactly what you are searching for (and it is also 512)
if you are happy with the reply, please mark it as the solution to your question, that would help me