Hello, I am reading the original Prompt Tuning paper which uses FlanT5 for the experiments, which is a encoder-decoder model. I was wondering if it works well across all kinds of Transformer models. Specifically, does it work for autoregressive models such as: GPT4 and Llama as well?
Additionally, I am also curious to know how much data is sufficient for prompt tuning?
Prompt tuning should work for all transformer models because it gives some context to the model but the efficacy it depends on the model (how big it is and on its training set).
How much data is sufficient? It depends on the scale and efficiency of the model itself and also on the quality of the prompt, if the prompt is hitting know areas that the model has previously learned, it should be more efficient than a model that hasnt been trained on that subject.