Memory usage in DDP

I am confused about the memory saving in different stage of FSDP. There is four GPU RAM needed for taining: Weights,Adam, optimizer,Gradients,Activations&temp memory. But in FSDP only three: Weights,Adam, optimizer,Gradients. where is the fourth one?
Beside, I don’t understand that why stage1 save memory up to 4; and stage2 save memory up to 8.

To anyone who also confused, here is a breif explain: imaging in stage1, weights and gradients are in BF16 or FP16, that is 2bytes each, added as 4 bytes, and Adam optimizer is in FP32 to maintain precision, so is 4bytes*2state equals 8bytes, also we need to consider weights that we need to used, also in FP32, that is 4bytes. so that can save memory up to 4/(4+8+4)==1/4. Same as stage 2.