Costs to finetune T5

gaspardbos · July 10, 2023, 3:53pm

I’m just going through the Scaling Instruction-Finetuned Language Models paper and I’m wondering if it is affordable for the average joe to finetune an LLM and get a model that’s good at zero shot general tasks.

On page 10 the authors state:
Flan-T5-XL is only 3B parameters and achieves a MMLU score of 52.4%, surpassing GPT-3 175B’s score of 43.9%. That’s awesome. GPT-3 was already impressive.

On page 4 the authors state:
For example, we only use 0.2% of the pre-training compute to instruction-finetune Flan-PaLM 540B…

On slide 144 of last weeks lecture notes I see that T5 3B took 100 Petaflop/s-days to pre-train.

On Runpod a regular 80GB NVIDIA A100 costs 1.79 dollars /hr (on demand)

Does that mean that
The cost of pretraining: 1.79 (runpod price) *24 * 2 * 100 (pre-train compute according to slides)
times the factor of finetuning: * 0.002
we could potentially spend as little as 17.18 dollars?

Vikram_Narayan · July 11, 2023, 12:19am

The paper states “approximately 512 v4 TPU chips for 37 hours”.

gaspardbos · July 11, 2023, 8:35am

I believe that’s for PaLM 540B

Topic		Replies	Views
Google/flan-t5-base on a laptob Generative AI with Large Language Models week-module-1	3	369	April 12, 2024
Flan_T5 and other LLM parameters Generative AI with Large Language Models week-module-1	2	465	July 3, 2023
Fine-tuning and chinchilla paper Generative AI with Large Language Models week-module-2	0	324	November 18, 2023
Fine-Tuning a Large Language Models with QLoRA and PEFT(LLMs) Generative AI with Large Language Models week-module-2	11	873	July 15, 2023
Question about optimal parameters and training dataset for Finetuning Generative AI with Large Language Models week-module-1	1	410	August 25, 2023

Costs to finetune T5

Related topics