Argmax
Argmax
Vahe Hagopian, Taka Hasegawa, Farrukh Rahman
15: InstructGPT
57 minutes Posted Mar 28, 2023 at 4:00 pm.
0:00
57:27
Download MP3
Show notes
In this episode we discuss the paper "Training language models to follow instructions with human feedback" by Ouyang et al (2022). We discuss the RLHF paradigm and how important RL is to tuning GPT.