Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

What is ChatGPT?

ChatGPT is an AI chatbot that uses natural language processing to create


humanlike conversational dialogue. The language model can respond to
questions and compose various written content, including articles, social
media posts, essays, code and emails.

ChatGPT is a form of generative AI -- a tool that lets users enter prompts to receive
humanlike images, text or videos that are created by AI.

ChatGPT is similar to the automated chat services found on customer service


websites, as people can ask it questions or request clarification to ChatGPT's
replies. The GPT stands for "Generative Pre-trained Transformer," which refers to
how ChatGPT processes requests and formulates responses. ChatGPT is trained
with reinforcement learning through human feedback and reward models that rank
the best responses. This feedback helps augment ChatGPT with machine learning to
improve future responses.

Who created ChatGPT?

OpenAI -- an AI research company -- created and launched ChatGPT in November


2022. It was founded by a group of entrepreneurs and researchers including Elon
Musk and Sam Altman in 2015. OpenAI is backed by several investors, with
Microsoft being the most notable. OpenAI also created Dall-E, an AI text-to-art
generator.

How does ChatGPT work?


ChatGPT works through its Generative Pre-trained Transformer, which
uses specialized algorithms to find patterns within data sequences.
ChatGPT uses the GPT-3 language model, a neural network machine
learning model and the third generation of Generative Pre-trained
Transformer. The transformer pulls from a significant amount of data to
formulate a response.

ChatGPT uses deep learning -- a subset of machine learning -- to produce


humanlike text through transformer neural networks. The transformer
predicts text, including the next word, sentence or paragraph, based on its
training data's typical sequence.

Training begins with generic data, then moves to more tailored data for a
specific task. ChatGPT was trained with online text to learn the human
language, and then it used transcripts to learn the basics of conversations.

Human trainers provide conversations and rank the responses. These


reward models help determine the best answers. To keep training the
chatbot, users can upvote or downvote its response by clicking on "thumbs
up" or "thumbs down" icons beside the answer. Users can also provide
additional written feedback to improve and fine-tune future dialogue.

Methods

We trained this model using Reinforcement Learning from Human


Feedback (RLHF), using the same methods as InstructGPT, but with slight
differences in the data collection setup. We trained an initial model using
supervised fine-tuning: human AI trainers provided conversations in which
they played both sides—the user and an AI assistant. We gave the trainers
access to model-written suggestions to help them compose their
responses. We mixed this new dialogue dataset with the InstructGPT
dataset, which we transformed into a dialogue format.

To create a reward model for reinforcement learning, we needed to collect


comparison data, which consisted of two or more model responses ranked
by quality. To collect this data, we took conversations that AI trainers had
with the chatbot. We randomly selected a model-written message, sampled
several alternative completions, and had AI trainers rank them. Using these
reward models, we can fine-tune the model using Proximal Policy
Optimization. We performed several iterations of this process.
n trainers provide conversations and rank the responses. These reward
models help determine the best answers. To keep training the chatbot,
users can upvote or downvote its response by clicking on "thumbs up" or
"thumbs down" icons beside the answer. Users can also provide additional
written feedback to improve and fine-tune future dialogue.

How are people using ChatGPT?

ChatGPT is versatile and can be used for more than human conversations. People
have used ChatGPT to do the following:

 Code computer programs.

 Compose music.

 Draft emails.

 Summarize articles, podcasts or presentations.

 Script social media posts.

 Create a title for an article.

 Solve math problems.

 Discover keywords for search engine optimization.


 Create articles, blog posts and quizzes for websites.

 Reword existing content for a different medium, such as a presentation transcript for a
blog post.

 Formulate product descriptions.

 Play games.

 Assist with job searches, including writing resumes and cover letters.

 Ask trivia questions.

 Describe complex topics more simply.

Unlike other chatbots, ChatGPT can remember various questions to continue the
conversation in a more fluid manner.

Limitations

 ChatGPT sometimes writes plausible-sounding but incorrect or


nonsensical answers. Fixing this issue is challenging, as: (1) during
RL training, there’s currently no source of truth; (2) training the model
to be more cautious causes it to decline questions that it can answer
correctly; and (3) supervised training misleads the model because
the ideal answer depends on what the model knows, rather than what
the human demonstrator knows.
 ChatGPT is sensitive to tweaks to the input phrasing or attempting
the same prompt multiple times. For example, given one phrasing of
a question, the model can claim to not know the answer, but given a
slight rephrase, can answer correctly.
 The model is often excessively verbose and overuses certain
phrases, such as restating that it’s a language model trained by
OpenAI. These issues arise from biases in the training data (trainers
prefer longer answers that look more comprehensive) and well-
known over-optimization issues.1,2
 Ideally, the model would ask clarifying questions when the user
provided an ambiguous query. Instead, our current models usually
guess what the user intended.
 While having make efforts to make the model refuse inappropriate
requests, it will sometimes respond to harmful instructions or exhibit
biased behavior. We’re using the Moderation API to warn or block
certain types of unsafe content, but we expect it to have some false
negatives and positives for now. We’re eager to collect user feedback
to aid our ongoing work to improve this system.

Source: https://openai.com/blog/chatgpt

https://www.techtarget.com/whatis/definition/ChatGPT#:~:text=ChatGPT%20is%20a
%20form%20of%20generative%20AI%20--,it%20questions%20or%20request%20clarification%20to
%20ChatGPT%27s%20replies.

You might also like