Computational challenges of training LLMs

Hari_Durai_Baskar · November 3, 2023, 4:57am

In the section of week 1 videos mentioned in the title, when explaining the Zero Redundancy optimizer stages activation and intermediate states occupied memory is not taken into account as it occupies a significant amount of memory around(8 bytes per parameter at-most ).

Can i please know the reason why it is omitted? May be my understanding is small, please help me with the explanation.

conscell · July 22, 2024, 4:58am

Excellent question! The size of forward activations depends on many factors, with the key ones being sequence length, hidden size, and batch size. There are the inputs and outputs that are passed and returned by the forward and backward functions, as well as the forward activations saved for gradient computation. In the paper discussed in this module, the activations, temporary buffers, and fragmented memory are called the residual states. The video focuses on Zero-DP, which has three main optimization stages: the partitioning of optimizer states, gradients, and parameters. Another method covered in the paper is called ZeRO-R, which aims to optimize residual memory consumption by activation partitioning and offloading activations to the CPU. The paper also discusses the combination of these methods. For details, I recommend checking the paper.

Topic		Replies	Views
Week 4 \| Building your Deep Neural Network Query Neural Networks and Deep Learning	3	536	February 15, 2022
C5W3A1 Neural Machine Translation Sequence Models week-3	2	266	January 12, 2024
Questions about "GPU RAM size needed to train 1B parameters" Generative AI with Large Language Models week-1	8	3193	February 20, 2024
Queries on backwards activation functions (C1W4) Neural Networks and Deep Learning	1	516	December 7, 2021
Course 5, week 1: How is it that -- because the GRU update gate is usually close to 0 -- we do not have a vanishing gradient problem? Sequence Models	5	560	June 26, 2022

Computational challenges of training LLMs

Related topics