Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

CHATGPT & LLMS

SEPARATING FACT FROM


FICTION FOR LOCALIZATION
REGISTER FOR THE LIVE
EVENT ON LINKEDIN

CHATGPT, LLMs,
FACTS, FICTION,
& THE FUTURE OF
LOCALIZATION
LIVE STREAMED ON MAY 18 @ 10:00 PST, 18:00 CET
Laszlo Varga Jourik Ciesielski

GLOBALIZATION RESEARCH SERIES


Foundational technologies such as transformer models and large language
models (LLMs) have made significant strides in natural language processing and
machine learning. But how do they compare to traditional machine translation?
We will examine the differences between LLMs and machine translation, and
how these technologies can complement each other in real-life applications.

Drawing from Nimdzi Insight's research, this event will also explore practical use
cases of ChatGPT in the language services industry, examining how this
technology is being utilized by organizations to solve real-world problems.
Additionally, we will address the challenges that come with implementing LLMs,
and what the future of these technologies may hold.
TODAY'S EVENT IS PART OF THE
GLOBALIZATION RESEARCH SERIES TRACK

The Globalization Research Series is


hosted by Nimdzi Insights and features
the latest published research through
discussions with industry thought leaders.
CHECK OUT ADDITIONAL TRACKS FOR THIS EVENT SERIES:

GLOBALIZATION SERVICE PROVIDER LESSONS IN GLOBAL ENTERPRISE


RESEARCH SUCCESS LOCALIZATION SUCCESS
UPCOMING
EVENTS FROM
NIMDZI
TODAY'S EVENT FEATURES FINDINGS AND RESEARCH PUBLISHED

...MORE AVAILABLE
ON NIMDZI.COM
TODAY'S AGENDA

TIMELINE OF INNOVATION
LARGE LANGUAGE MODELS VS
MACHINE TRANSLATION
USE CASES
CHALLENGES & FUTURE
TIMELINE
AI has been around in the language industry
for longer than most care to remember.
Google Translate on NMT base
was released in 2016.

(NMT is AI)
"Attention is all you need"
is dated to 2017.

... and ChatGPT brought that to


everyone’s attention.
FOUNDATIONAL TECHNOLOGIES
PRELUDE:
2017: "ATTENTION IS ALL YOU NEED" - THE TRANSFORMER (BY GOOGLE)
2018: BERT (BY GOOGLE). POWERS SEARCH.
2020 JUNE: GPT-3 (BY OPENAI). EXPERIMENTS, PROTOTYPES, NO HYPE.

THE HYPE:
2022 NOVEMBER: CHATGPT (BASED ON GPT-3.5). MASSIVE PUBLIC HYPE.
2023 FEBRUARY: MS BING WITH GPT-4 UNDER THE HOOD.
2023 MARCH: GPT-4. MASSIVE IMPROVEMENTS. MULTI-MODALITY.
2023 MARCH: OPENAI PLUGINS. THE GENIE IS OUT OF THE BOTTLE.
2023 MARCH: MS 365 COPILOT
2023 MAY: PALM2, BARD, AND MORE FROM GOOGLE
Also: Jurassic, BLOOM, Claude, LLaMA family...
FOUNDATIONAL TECHNOLOGIES
UNDERLYING INFRASTRUCTURE AND SYSTEMS THAT SUPPORT AND
ENABLE OTHER TECHNOLOGIES TO FUNCTION AND EVOLVE

NO DIRECT USE CASES


CORE TO THE DEVELOPMENT AND DEPLOYMENT
OF MORE COMPLEX TECHNOLOGIES
DESIGNED TO PROVIDE A STABLE AND RELIABLE
PLATFORM FOR MANY OF THE AI-DRIVEN
LANGTECH TRENDS
FOUNDATIONAL TECHNOLOGIES
EXAMPLES
RULE-BASED MODELS
Hand-crafted rules and patterns to perform NLP tasks
STATISTICAL MODELS
Statistical (or probability) methods (n-gram language models, part-of-
speech taggers) to perform NLP tasks
NEURAL NETWORKS
Interconnected layers of artificial neurons to learn patterns in data
and make predictions (many language-related use cases!)
FOUNDATIONAL TECHNOLOGIES
ONE TYPE OF NEURAL NETWORK, THE SO-CALLED “TRANSFORMER MODEL”,
HAS TURNED THE LANGUAGE SERVICES INDUSTRY UPSIDE DOWN.

Based on the concept of “self-attention”


Capture long-range dependencies and relationships between words in
the input sequence
Most well-known application: large language models (LLMs)
Trained on massive amounts of text data
Achieving human-like performance when tasked with content
generation
Significant breakthrough in NLP
LARGE LANGUAGE
MODELS VS
MACHINE
TRANSLATION
1947 1966

A CENTURY OF
Warren Weaver introduced the Automatic Language Processing
idea of using electronic computers Advisory Committee (ALPAC) published
for language translation, marking a report expressing skepticism over

MACHINE
the first proposal of statistical machine translation, resulting in cut
machine translation. budget for MT research.

1933 1954
The first machine translation The Georgetown-IBM

TRANSLATION
patents were issued in France experiment showcased the
(Georges Artsrouni) and Russia initial public display of
(Petr Troyanskii). machine translation,
specifically translating
Russian to English.

2006 1997 1991 1968


Google launched The first paper on encoder- Researchers at IBM's Thomas J. Systran was established, becoming
Google Translate, which decoder structure for machine Watson Research Center one of the few machine translation
began using SMT translation was published. We reintroduced statistical machine systems to survive the ALPAC report.
technology in 2007. are seeing the first steps toward translation, leading to a surge of The US Air Force adopted Systran for
neural machine translation. new research throughout the 1990s. Russian-English translation in 1970.

2014 2003 1997 1977


The sequence-to- A language model based Systran launched a free web- The Meteo system was launched to
sequence model enabled on neural networks was based commercial machine translate meteorological forecasts
neural networks to learn the developed by researchers translation system. from English to French, followed by
mapping between source at the University of Businesses began to offer MT the release of a French-English
and target languages. Montreal. as a value-added service. version in 1989.

2016 2022
Google Neural Machine Translation ChatGPT, a large language
system was introduced and has been model built on large-scale
employed in Google Translate ever since. neural networks, was released.

2014 2017 2023


The attention mechanism allowed the Introduction of the transformer GPT-4, a next-generation language
network to selectively concentrate on architecture, which revolutionized large model built on large-scale neural
distinct portions of the input sequence language models and paved the way for networks with advanced features was
while generating the output sequence. advanced applications in natural released.
language processing.
ON THE MACHINE TRANSLATION BENCH

NMT LLM
predictable unpredictable
(known and understood) (early days?)

more accurate, less fluent more fluent, less accurate

narrow-purpose general purpose

works 90+% of the time


hard to tell if and when it will work well
(and we know when it doesn't)

ROI defined easy to overcommit


LLM VS MT
LLMS HAVEN’T NECESSARILY BEEN BUILT WITH TRANSLATION IN MIND:
Not exclusively trained on translation data
Trained to always provide an answer, even if it doesn’t make sense
NEVERTHELESS, LLMS OFFER A BROADER RANGE OF POTENTIAL
APPLICATIONS THAN REGULAR, TRANSLATION-FIRST SYSTEMS EVER COULD!
Detection and elimination of gender bias
Different types of QA, including quality estimation
Automated post-editing
Rewriting or paraphrasing MT output
Terminology extraction (multilingual!)
Transcreation, adaptation
Automated editing of translation memory suggestions
LLM VS MT
ON TOP OF THAT, LLMS HAVE THE ABILITY TO LEARN FROM CONTEXT.
Source text
Legacy translations
Style guides
Glossaries
Additional input provided by users
→ GPT-4 can memorize roughly 50 pages of context!

MORE INTRIGUINGLY, LLMS ARE CAPABLE OF RESPECTING HUMAN


INSTRUCTIONS.

Are LLMs evolving from MT enrichment tool to core technology for


automated translation?
USE CASES
USE CASES GALORE

Machine translation: being done (Intento, Custom.MT, etc.)


Source content optimization
Engineering: automation, scripting, technical support
Multilingual content creation: “post-creator” vs. post-editor?
Authoring & proofreading: documentation, style guides, templates, blogs,
social media, etc.
Linguistic context: “turbo-powered Google” (first TMS integrations are being
released – Crowdin, Lokalise, Trados)
Don’t forget about technical writing, audiovisual localization, etc.!
Weaving different assets into one next-generation product through the API
MULTILINGUAL AI CONTENT CREATION
IS HERE TO HAVE A SEAT AT THE TABLE
NO SOURCE, WHAT DOES THIS MEAN?
JUST A CONTENT BRIEF.

"Raw GPT output" + human PE The opportunity to go


"upstream".
Demand will increase, taking POST-COPYEDITING
over demand for translation for (“post-creation”).
certain content types (e.g. TRAIN your language talent
product descriptions.) for this new service.
CHALLENGES FOR
THE FUTURE
THE ELEPHANT IN THE ROOM

Will LLMs eliminate


the language industry?
NO.
THE ELEPHANT IN THE ROOM

LSPs sell words, but deliver value by


removing complexity (PM expertise)
and by their supply chain (VM -
language and cultural expertise).

Buyers don’t want to (micro-) manage


the language service, Will LLMs eliminate
or the freelancers. the language industry?

This has not changed.


HOW TO LOOK AT IT?
EMBRACE THE RISE OF CHATGPT, GPT-4 AND LLMS IN GENERAL
ANALYZE IT CAREFULLY, EXPLORE IT STRENGTHS, WORK
AROUND ITS WEAKNESSES
TRY TO DEFINE SOLID USE CASES (JOINT EFFORT!)
BECOME EARLY ADOPTERS
THINK IN TWO DIRECTIONS:
1. HOW CAN I BUILD PRODUCTS OR SERVICES WITH IT?
2. HOW CAN I USE IT TO MAKE MY LIFE EASIER?
ADVICE
TALK TO YOUR CLIENTS ABOUT THEIR BUSINESS NEEDS
PULL YOUR CLIENTS IN, TRANSPARENTLY, WITH DISCLAIMERS
REMEMBER, IT'S EASY TO OVERCOMMIT (AND UNDERPERFORM)
USE YOUR TALENT POOL TO EVALUATE RESULTS
GET YOUR SUPPLY CHAIN READY (PROBABLY YOUR GREATEST
ASSET IN THE WHIRLWIND)

We've been through this before...The language services industry has a ~6


year headway in figuring this out.
HOW IS CHATGPT DIFFERENT?

BUYER SIDE LANGUAGE INDUSTRY SIDE

Typical approaches: Typical approaches:


SMB: immediate urgency, lack of Startups: gold rush. Success makes
vision. heroes, mistakes are forgotten.
Enterprise: experiment with caution. Established players: experiment with
Is there ROI? caution. Mistakes can be lethal.
Governments: distrust. Mid-range: uncertainty. Friend or foe?
Keyword: FOMO.

INNOVATION WITH LLMS IS NOW DEMANDED


FROM ANY AND ALL AREAS OF BUSINESS.
REAL THREATS OF LLMS TO LSPS

SECURITY AND PRIVACY


LOW BARRIER TO ENTRY FROM
OTHER INDUSTRIES
BUYERS CAN PULL TECH IN-HOUSE
REDEFINING “GOOD ENOUGH”
THE TROUGH OF DISILLUSIONMENT
ANY FINAL
COMMENTS
TO WRAP UP?
UPCOMING
EVENTS FROM
NIMDZI
HOW DOES NIMDZI
HELP COMPANIES
EVOLVE?
(A FEW EXAMPLES...)
VISIT NIMDZI.COM/TESTIMONIALS
FOR MORE CASE STUDIES
LANG-TECH ROADMAP ASSESSMENT

Technology strategy is
a big decision. Global
organizations validate
their strategy with
Nimdzi before
selecting a TMS.
Throughout the TMS assessment project we felt we were
in good, capable hands. Not only were we supported by
the TMS consultancy expert in a commendable manner,
he also went out of his way to educate us on translation
and localization best practices that we were not aware of.
It was a pleasure to work in this manner.

BEATA KOPCZYNSKA, THE DIGITAL ASSET SYSTEMS & CONTENT


DEPLOYMENT DIRECTOR
CASE STUDY: LOCALIZATION PROGRAM AUDIT
AND SUPPLY CHAIN GOVERNANCE DEPLOYMENT

Nimdzi consultants engaged


with the client over the course
of 6 months to perform an
audit of the currently deployed
technology and processes for
localization in order to gather
information to provide a
technology and automation
scope of work, budget, and
schedule.
ENTERPRISE L10N BOOTCAMPS AND TRAINING

Nimdzi clients rely on


customized bootcamps to
support the professional
development of our
enterprise client
localization teams.
“Just a quick note to thank you for your fantastic workshop! I can
tell you, people can't stop raving about what you taught today!
Your style along with the high-quality materials are simply
awesome. It was so great to have 50+ folks in the live audience. I
can't wait to share the recording and the deck with an ever wider
audience. International is a top priority for Smartsheet this year.
I'm so glad you were able to help me set this journey up for
success. Can't wait for the next workshop.”

CARSTEN KNEIP, SMARTHEET LOCALIZATION DIRECTOR


More resources: www.nimdzi.com

You might also like