ChatGPT explained: A Guide to Conversational AI w/ InstructGPT, PPO, Markov, RLHF

code_your_own_AI
code_your_own_AI
7.8 هزار بار بازدید - 2 سال پیش - How does ChatGPT work? Given
How does ChatGPT work? Given the training details from OpenAI about InstructGPT, I explain in simple terms how ChatGPT can reproduce such great results, given a simple prompt. And what Reinforcement Learning from Human Feedback (RLHF) means in detail, incl PPO, Markov , ...

If you like easy explanations with visualizations, this is the video for you. Additional literature is available here:

https://spinningup.openai.com/en/late...
https://arxiv.org/pdf/1707.06347.pdf
https://towardsdatascience.com/policy...

The Images in-between | Before Diffusion: Variational Autoencoder VAE explained w/ KL Divergence:
The Images in-between | Before Diffus...

00:00 Instruct GPT, ChatGPT
01:42 ChatGPT Create Demonstration Data set
05:33 Create Ranking Data set
07:36 Proximal Policy Optimization Algo
09:37 Policy gradients and Reinforcement Learning
12:45 Supervised Policy model and Reward model interact
14:20 ChatGPT explained

All credits with OpenIA.

Thanks to Canva (canva.com) for providing of free version of canva to the global community.    @canva  

#chatgpt
#theory
#ai
2 سال پیش در تاریخ 1401/09/21 منتشر شده است.
7,805 بـار بازدید شده
... بیشتر