I completed the graded M2 graded lab2 (GRPO Post Training Lab). I would like to actually run training with this code. Has anyone done this already? If so any suggestions or pitfalls? I would like to avoid paying for GPU time for debugging stuff that can be done before getting on a GPU. Thanks.