InstructGPT -Training language models to follow instructions with human feedback - short review

Namrata Shivagunde
Namrata Shivagunde
3.4 هزار بار بازدید - 2 سال پیش - Training language models to follow
Training language models to follow instructions with human feedback arxiv.org/abs/2203.02155 #gpt #instructgpt #rlhf #alignment #nlp #prompt #reinforcement learning # reinforcement learning from human feedback
2 سال پیش در تاریخ 1401/12/29 منتشر شده است.
3,491 بـار بازدید شده
... بیشتر