OpenAI has released ChatGPT, a conversational AI model based on their GPT 3.5 language model (LM). ChatGPT is fine-tuned using Reinforcement Learning from Human Feedback (RLHF) and includes a moderation filter to block inappropriate interactions.
The release was announced on the OpenAI blog. ChatGPT is trained using the same RLHF methods used to train InstructGPT, OpenAI’s instruction-following language model. RHLF uses two datasets: one of human-written examples of supervised fine-tuning of the GPT-3.5 LM and human-labeled comparisons of LM outputs to train a reward model for reinforcement learning. OpenAI released ChatGPT to get user feedback and explore its limitations:
Today’s research release from ChatGPT is the latest step in OpenAI’s iterative delivery of ever safer and more useful AI systems. Many lessons learned from the use of previous models such as GPT-3 and Codex have influenced the security mitigations for this version, including significant reductions in harmful and false outcomes achieved through the use of reinforcement learning from human feedback… We know these many limitations remain. .. and we plan to make regular model updates to improve in such areas. But we also hope that by providing an accessible interface to ChatGPT, we’ll get valuable user feedback on issues we’re not yet aware of.
GPT-3.5 is the latest in OpenAI’s GPT series of large language models. Earlier this year, OpenAI released a technical paper on InstructGPT, which attempts to reduce toxicity and hallucinations in the LM’s output by “aligning” it with the user’s intent. First, a basic “guideline” for the LM is fine-tuned using a record of a set of prompts for the LM along with human-written desired responses. Next, a reward model is trained from a dataset of LM-generated prompt responses, ranked by human labelers. Finally, the baseline policy is further refined via Proximal Policy Optimization (PPO) using the rewards model.
Image source: https://openai.com/blog/chatgpt/
Using this technique, OpenAI improved the hallucination rate of GPT-3 from 41% to 21%. InstructGPT also produced “about 25% less toxic outputs than GPT-3 when asked to be respectful.” ChatGPT was trained using the same general method, but in the first step people generated a data set by inventing conversations between themselves and an imaginary chatbot. The OpenAI researchers found that this introduced a bias into their training data (“longer answers that look richer”), causing the model to sometimes provide verbose answers.
The tech community has been actively experimenting with the model. In a Hacker News discussion of ChatGPT, several users pointed out that the model’s responses were “muffled” and “more filtered” than GPT-3’s. A user replied:
I understand why people are a little frustrated with the “safety bumpers” in this regard. But I would say that I’m really impressed with the quality of these security checks. This is an AI that seems to know what it can do and what it can’t give a decent answer to. I don’t know if this is hard-coded or practiced, but it’s really impressive when compared to the hallucinations that typically appear in GPT3.
On Twitter, linguist and NLP educator Rachael Tatman wondered if OpenAI had published a technical paper via ChatGPT. AI Entrepreneur replied Will Spagnoli:
They published a paper with the first [InstructGPT] model’s release that explains how they did it, and the new ChatGPT and text-davinci-003 are just newer versions of the same thing, only now they have a lot more flagged data from human feedback that caused the performance gains.
OpenAI hasn’t released the code or models for ChatGPT, but a free demo is available on the web.