✨ New course! Enroll in Reinforcement Fine-Tuning LLMs with GRPO

:arrow_forward: Enroll Now!

Join Reinforcement Fine-Tuning LLMs with GRPO , built in collaboration with Predibase, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead.

Reinforcement Fine-Tuning (RFT) is a technique for adapting LLMs to complex reasoning tasks like mathematics and coding. RFT leverages reinforcement learning (RL) to help models develop their strategies for completing a task, rather than relying on pre-existing examples as in traditional supervised fine-tuning. One RL algorithm, called Group Relative Policy Optimization (GRPO), is beneficial for tasks with verifiable outcomes and can work well even when you have fewer than 100 training examples. Using RFT to adapt small, open-source models can lead to competitive performance on reasoning tasks, giving you more options for your LLM-powered applications.

In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn how to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks.

1 Like

Hi,

May I ask where shall we find the slides to this course?

Thank you!

Short courses do not provide the lecture slides.