Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

PUBLIC

openSAP
Generative AI at SAP

Unit 1

00:00:06 Welcome to this OpenSAP course on Generative AI with SAP. My name is Sean Kask, Chief
AI Strategy Officer at SAP
00:00:14 Artificial Intelligence, and I will be leading you through the course.
00:00:19 As Philip shared in the teaser, this course consists of five units. There is a course assignment
and a discussion forum where we
00:00:26 encourage you to participate. This is a nutshell course which is open to learners of all levels.
00:00:32 So the first three units, we will introduce you to the basics of artificial intelligence and how we
went from AI to generative AI,
00:00:40 as well as what we need to do to adapt generative AI to business processes.
00:00:44 Let's get started with Unit 1. In Unit 1, you will learn the definitions
00:00:49 of intelligence, and from that, what is artificial intelligence. We'll learn about how humanity has
tried to implement
00:00:57 artificial intelligence over the past five or six decades. and we'll learn about the relationship
between AI,
00:01:05 machine learning, deep learning, leading us up to generative AI. So first I would like to just
take a moment to think what do we
00:01:16 mean when we talk about intelligence. So if you say that person is very intelligent or that looks
like an
00:01:22 intelligent solution, what does it actually mean? And I like the simple definition here from Max
tegmark,
00:01:29 he defined intelligence as the ability to accomplish complex goals.
00:01:36 Jeff Hawkins, neuroscientist, defines intelligence as the ability to see patterns and predict
outcomes based on previous experiences.
00:01:44 So this incorporates some aspect of learning and prediction. And one of the founders of AI
research from Stanford,
00:01:54 he defines intelligence as the quality that enables an entity to function appropriately and with
foresight
00:02:00 in its environment. So this includes things that humans can do like generating language,
00:02:05 creating music for example, you know playing challenging games, all these kind of things.
00:02:14 And you know quite simply then if we go from intelligence to what is our artificial intelligence,
artificial intelligence is simply
00:02:20 intelligence exhibited by non biological systems, so by machines.
00:02:26 What do we mean by this? In humans and in animals, intelligence is taking place
00:02:33 in our brains. So these are neurons on a substrate of biological tissue, okay, wetware
00:02:40 inside of your brain. In computers we simply have digital neural networks
00:02:44 that are exhibiting some kind of intelligence on a substrate of, you know, at the moment silicon
ships.
00:02:51 And when we talk about artificial intelligence and computers, for the most part we're talking
00:02:56 about something called narrow AI. So narrow AI is the ability to accomplish a narrow set of
goals,
00:03:02 otherwise called weak AI. So for example we will create a model that is very good at doing one
thing
00:03:10 typically. So a model that is good at classifying cat pictures.
00:03:13 That model typically cannot excel at something like playing chess or different kinds of goals
and different kinds
00:03:22 of intelligence there's also a term called artificial general intelligence
00:03:28 and that is basically human level intelligence displayed by machines that's the ability to
understand and learn any intellectual tasks
00:03:37 that a human can otherwise called strong AI I bring this up because with generative AI there's
a lot of discussion
00:03:45 is is that getting towards artificial general intelligence. It's not quite yet artificial and general
intelligence,
00:03:51 although as we'll see later in the course, generative AI does have the ability to generalize and
access
00:03:58 all different kinds of tasks. So humanity's first attempt at scaling artificial intelligence
00:04:06 was called symbolic AI. So with symbolic AI, we sit a human in front of a computer
00:04:11 and we codify knowledge. So these can be rules, deterministic programming for anyone
00:04:18 who's a programmer these so-called expert systems and you know we're basically you know if
you're solving a problem
00:04:24 you have then if then else type type rules which works pretty well and that's actually one of the
reasons why a company like SAP
00:04:33 was so successful so back in the 80s and 90s we'd put an expert in front of a computer and
say how do you I don't match invoices for example
00:04:42 how do you close your books how do you run a warehouse and encode this into the system.
00:04:47 And that worked pretty well. So that got us pretty far and that will always serve us as humanity.

00:04:54 And we still see this all around us today, for example in semantic web, ontologies, automated
planning, and scheduling systems.
00:05:03 But let's have a little thought experiment here. How many rules would you need, so in a
deterministic programming
00:05:10 setting, to describe something simple as identifying a picture of a cat versus a dog in a
computer.
00:05:18 So for a human, we might say, well, you know, a cat has whiskers, perhaps.
00:05:22 They're a little bit longer than a dog, although that's not always true.
00:05:25 Maybe a cat's face is a little bit more squished, a little fluffier. This is very difficult to explain to
a computer,
00:05:31 which is looking at, you know, pixels, zeros and ones, and RBG settings inside of a piece of
data.
00:05:38 And quickly, that becomes nearly impossible. That's an impossible task, frankly, to describe
using rules
00:05:44 and deterministic programs. And the British philosopher Michael Plan, you put it very well,
00:05:49 and he basically said, we know more than we can tell. So anyone who can, I don't know, ride a
bike or kick a soccer ball,
00:05:58 for example, you know how to do this. If you were to try to describe very explicitly, like,
00:06:05 because computers need explicit information, how to do that, you know, how do you kick a
ball?
00:06:10 like what muscle in your leg is twitching at which force and which angle and how do you plant
your foot and all these kind of things,

2 / 19
00:06:18 it quickly just becomes impossible. And this is on the nature of explicit knowledge, okay,
00:06:23 where you can explicitly write things down and describe it versus tacit knowledge, which is
more just knowing how to do
00:06:29 it. And so very quickly you see that we run into limits describing these
00:06:35 complex rules and the relationships between the inputs and outputs a computer system.
00:06:42 And that's where artificial intelligence, modern approaches to artificial intelligence
00:06:47 with machine learning come along. So this is probabilistic, and rather than explicitly
programming these
00:06:53 rules into the system, we are showing examples to the computer.
00:06:57 So we're showing it examples of the input data and examples of the output data, and allowing
it to learn by itself
00:07:04 relationships between these. The most popular and the leading paradigm right now is called
00:07:12 supervised learning and going back to the cat example, we'll simply take pictures of cats and
pictures of dogs with a label
00:07:21 and we'll show that to the computer and we'll apply some kind of machine learning algorithm.
00:07:26 This is learning algorithm in the background. There are many of these different approaches
00:07:31 have different pros and cons there's not one and we're going to allow it to learn these
relationships and train a model,
00:07:38 simply by showing it examples. And then once we have this trained model,
00:07:44 we can put this into a production system, and we can show it new examples that it hasn't seen
before.
00:07:49 This is called inference, and it's making some kind of prediction on the outcome.
00:07:56 And if you think about it, the importance for software engineering and a company like SAP, is
this allows us to extend the reach
00:08:04 of computer systems. Because, again, deterministic programming,
00:08:09 symbolic AI, expert systems, that will always be there. That gets us pretty far.
00:08:14 We still do that. But artificial intelligence, especially machine learning,
00:08:20 allows us to work with unstructured data, like pictures, sound, text, which is most of the data,
not just data and tables,
00:08:28 and it allows us to pick up these very complex relationships that we have that may be just
simply too complicated to capture and rules.
00:08:39 So there are roughly three approaches to how we do artificial intelligence and machine
learning with statistical statistics.
00:08:48 The one that we just showed was called supervised learning. So we're simply training the
model with labeled examples.
00:08:54 So we're telling it here's the input data you get to your machine, and here's the output we want,
and we've somehow labeled that.
00:09:02 And the output is typically some kind of regression, so we have a continuous number that
comes out of it,
00:09:07 and most of it is frankly classification. So does it fall into, you know, category 1, 2,
00:09:11 3 when we've shown it something? An example could be ticket classification in a company.
00:09:16 So you get service tickets that come in, and then you classify the ticket.
00:09:20 Is this a ticket for HR or for travel and expense or an IT ticket or a product ticket or whatever,
and that's something we can scale.
00:09:31 Related is unsupervised learning. So in unsupervised learning, the data set is not labeled.
00:09:37 We're simply showing all of this data to the computer, and the computer is finding structures
and patterns out of that
00:09:44 data, and humans then have to interpret that.
00:09:47 So for For example, something like fraud detection using clustering.

3 / 19
00:09:51 So we could cluster all of the transactions or data in a system based on a whole bunch of
different dimensions,
00:09:58 different attributes of the data. And we get clusters, and you might have one that sticks out,
00:10:03 that doesn't fall into those clusters, that is statistically outside of the range.
00:10:07 And maybe we flag that and we say, huh, let's investigate that. Could that be fraud?
00:10:14 And the last major paradigm in statistical approaches to AI is something called reinforcement
learning.
00:10:21 Not to be confused with retraining a supervised learning system. And in this we have an agent
that is in some environment
00:10:31 and it's taking an action in the environment. So we're not starting with label training data or
anything
00:10:36 like that. So it's basically observing where it is, it takes some kind of action
00:10:40 and then based on that action, it's either rewarded or punished, and it trains itself.
00:10:45 So it's learning what we call a policy. So the example of this is, you know, computers
00:10:52 that can play chess for example, or video games, you just put them in. AlphaGo is a big
example, which is a very complex board game.
00:11:00 You put them into the system, they try all sorts of different things, you know, in the beginning
00:11:05 they can't do anything, and just over time they kind of learn how to navigate the the
environment.
00:11:12 Bringing this back towards generative AI. So generative AI and related to this foundation
models
00:11:21 utilizes a certain kind of algorithm called neural networks. This is a attention-based transformer
model.
00:11:28 The labels though, this is a derivative of supervised learning, okay, but the labels are not
labeled
00:11:37 by people, the labels are somehow inherent in the structure of the data.
00:11:42 So for example, a large language model will predict the next word, so the label is the next word

00:11:48 in a sentence. And that's something that we'll explore in more detail
00:11:51 in the next unit. So just before we go on, I want to make a note
00:11:57 on probabilistic thinking and uncertainty and being comfortable with these kind of systems.
00:12:03 Because most managers, most people, especially in enterprises, grew up with these rules-
based expert
00:12:09 systems, deterministic programming. And now we come to artificial intelligence,
00:12:13 which is actually probabilistic in nature. So it's not saying this is a picture of a cat because it's
coming
00:12:19 out of rules and a table. It's saying this is a picture of a cat with a probability of,
00:12:24 or confidence score of 0.95 or something, huh? And some people are just uncomfortable with
that.
00:12:32 So I've left a picture up there on the screen for a few seconds for people to look at.
00:12:37 Don't rewind please, okay, without rewinding. What did you see in that picture?
00:12:44 And how certain with where are you with what you saw? Take a minute.
00:12:51 Is this a concert that you saw or is this a cotton harvester? Okay, so this picture is, it's been
around for a while
00:12:58 on the internet. I've been doing this for four years with big groups of people,
00:13:03 typically 80% of the people say this is a cotton harvester or sorry this is a this is a concert
excuse me when in fact
00:13:10 it is a cotton harvester. So you know as amazing as human intelligence is,

4 / 19
00:13:15 people are not 100% accurate and all knowledge has some level of uncertainty and probability
in it.
00:13:22 People need to be comfortable with that in enterprise settings. In a computer when we teach it
via machine
00:13:29 learning it's similar and you know but it can scale and it can execute tasks not always with
100% accuracy
00:13:37 but typically much faster and even more accurately than people can do and of course we
always stress in these probabilistic type systems
00:13:44 we need people working together with artificial intelligence rather than just letting the system
do everything by itself.
00:13:52 Good so to summarize these approaches and what we've discussed today so So intelligence
is just the ability to achieve complex goals.
00:14:01 And then artificial intelligence is simply intelligence exhibited by machines.
00:14:06 This includes many different approaches. Artificial intelligence is a very, very big field.
00:14:11 It's not just machine learning or deep learning. It includes approaches like symbolic AI that we
discussed
00:14:17 all the way through to cutting-edge stuff like hardware and neuromorphic computing
00:14:22 that is still evolving. Learning right now is probably the dominant paradigm
00:14:28 in artificial intelligence, which is leading to many of these advances.
00:14:32 So simply, computers learn from examples without being explicitly programmed.
00:14:37 There's many different kinds of algorithms you can do this. So if we're using a regression, for
example, or a logistic model,
00:14:43 all the way down to deep learning, these are still artificial intelligence.
00:14:48 And within that, the dominant paradigm though is deep learning.
00:14:54 So deep learning is simply, you know, one class of algorithms that we use for statistical
modeling, and these are typically
00:15:02 artificial neural networks, and there's all different kinds of neural networks as well, like
convolutional neural networks,
00:15:08 RNNs, and the newer class, which has led to a lot of these advancements, transformers.
00:15:16 Bringing this back to generative AI, generative AI is, again, a subfield of this deep learning and
machine learning.
00:15:23 based on something called foundation models and using a very specific kind of neural network
called attention based transformer model
00:15:32 and as we'll see in the next unit they exhibit all of these interesting tasks like emergent
capabilities
00:15:38 that come out of the content so we'll continue with that in the next section and close this and
thank you very much

5 / 19
Unit 2

00:00:06 Welcome to unit two of this open SAP course on generative AI with SAP. In this unit, we're
going to get an introduction into generative AI.
00:00:16 So you will learn what is behind all the news about AI and its impact on the economy.
00:00:21 We will define foundation models, large language models, and generative AI. We will
understand why generative AI is a new approach
00:00:31 to artificial intelligence, and we'll finish with some examples of generative AI.
00:00:38 So it's difficult to open a newspaper or look on the internet, open YouTube right now, without
hearing something
00:00:45 about generative AI. It is all over the news.
00:00:49 We see that the US Congress, for example, is holding hearings on generative AI. We see that
entire industries like publishing, video games,
00:00:56 are all being transformed with generative AI. We see economists discussing the impact of this
on society and the economy,
00:01:05 and we see companies like SAP making announcements around generative AI and AI in our
products.
00:01:14 What is going on? So I want people to understand that this hype that we hear, all the news
00:01:19 around generative AI, is real. So this will impact what we do and how we live, like past
technologies.
00:01:26 If we look at mobile telephones, if we look at the internet, if we look at personal computers,
these all profoundly impact society
00:01:33 and how we work. Generative AI will be no different.
00:01:37 There was a study that came out recently from McKinsey, among others, that said that, look,
artificial intelligence,
00:01:45 so what we learned previously, machine learning, was already set to add between 11 to 17
trillion dollars of value to the economy
00:01:54 by 2040. With Generative AI, they expect an additional $2.5 to $4.5
00:02:00 trillion added to global GDP. To put this in perspective, the gross domestic product, or GDP,
00:02:07 of Germany is about $4 trillion right now. And surprisingly, many companies have already
started their journey
00:02:15 with generative AI. So with more traditional artificial intelligence,
00:02:20 I would say between a fourth to a third to a half of companies had some artificial intelligence in
production.
00:02:27 And that's been around for a while. Generative AI, already we see around a third of companies

00:02:32 are using generative AI regularly in at least one business function. They all expect to increase
their investment in AI overall and most
00:02:43 companies have seen some increase in revenue or some positive impact on the company as a
result of adopting generative AI.
00:02:52 Let's look at definitions. What is generative AI?
00:02:55 We're going to start with foundation models. So foundation models are neural networks trained
on huge volumes of data
00:03:04 using a self-supervised learning objective that can be applied to many different tasks.
00:03:11 Let's look at an example of one kind of foundation model to understand what these are.
00:03:16 This is large language models. So large language models are simply foundation models
00:03:21 trained on text, including computer code. What do we mean by a self-supervised learning
objective?

6 / 19
00:03:29 Basically, we take a large amount of data. I mean, this can be billions, even trillions of words
that we've gotten
00:03:37 from certain sources. Many of these big companies have scraped this from the internet,
00:03:41 and the label in the data is the next word. So, people do this.
00:03:46 So, if we give an example like London is in blank, most people will think, oh, maybe England,
maybe United Kingdom, maybe Europe, okay, because remember,
00:04:00 knowledge is probabilistic, and the machine is learning this. If we give the machine now more
context, so we're giving more context
00:04:07 into the training, and I say two hours south of Toronto, London is in, and now people who know
a little bit of geography
00:04:16 won't think, okay, England, this is in Ontario or in Canada. And again, trained on billions, in
some cases even trillions of words.
00:04:29 Generative AI is an application of foundation models that can create new output, and this can
be text, images, sound, video, etc.,
00:04:39 based on simple user input called prompts. And not all foundation models are generative and
not all generative AI
00:04:46 is based on foundation models, but the majority of it is. Chat GPT, that most people are
familiar with,
00:04:55 is simply an application built on top of actually several large language models and foundation
models.
00:05:03 This is a model that has been optimized for dialog, so conversing back and forth. I will mention
that where these are headed very quickly, and probably the future
00:05:13 of foundation models in generative AI, is multimodal models. These are models that can work
with several forms of data,
00:05:21 for example converting text into images or looking at images and videos and describing that in
text.
00:05:31 So what are the capabilities of foundation models? So, as we described, we take huge
amounts of data, we train that, and we have some
00:05:40 big foundation model. And what can we do with that now?
00:05:43 What are the characteristics of foundation models? One is that they're huge, okay?
00:05:49 So, these are measured typically in billions or even trillions of what are called parameters.
00:05:55 So parameters are just the numbers behind the model that it's learned and trained on billions
or trillions of what are called tokens.
00:06:07 For all respects, a token can be simply a word, for example. It's a way that we've made this
data machine readable.
00:06:16 Important to understand is that foundation models are stateless. I think computer programmers
will get this right away,
00:06:23 but it means that they respond only to data that is being sent to the model, but they remain in
their original state, so they're not remembering or learning
00:06:33 from the interactions of what has been sent. If you've sent something to it, it sends you a
response.
00:06:38 It remembers nothing from what you've sent to it. And as we'll look in the next slide, the
performance and the capabilities
00:06:47 seem to, in general, scale as these models increase in size. Why is this a new approach to AI?

00:06:56 In Unit 1, we learned a little bit about supervised learning and narrow AI. We train a model to
do one thing very, very well specifically, and that's all it
00:07:08 can do. This takes typically a lot of labels.
00:07:12 We need labeled training data and high quality of data. With generative AI, we move from the
supervised learning

7 / 19
00:07:20 paradigm to a self-supervised learning paradigm. The label is somehow inherently in the data,
so that we've defined this
00:07:27 learning objective in a way that scales and kind of labels itself out of all this data we're feeding
it.
00:07:35 We've moved from predefined to emerging capabilities. So, for example, we train this model to
identify cat pictures,
00:07:43 that's all it can do, it's predefined, that's all we want it to do, to, as we'll see in a second,
models that simply have capabilities
00:07:52 coming out of it that no one expected. So they still discover capabilities that these models can
can can can do.
00:08:01 From single purpose to multi-scenario AI, so again one model trained to do one thing, to one
model that can do a whole bunch
00:08:08 of things and handle all different kinds of scenarios. And most models in the past, to be fair,
everything from Netflix
00:08:17 to Uber, were classifying artificial intelligence, so it's making some class coming out of it based
on supervised learning,
00:08:26 to AI that is generative and it can actually create new data out of the model.
00:08:32 What this means is that users can easily interact with the model without requiring a lot of
technical knowledge.
00:08:40 Today I'm going to talk about how to send data to a model like ChatGPT without necessarily
being a data scientist and without having to understand how
00:08:52 the training lifecycle works. We get these capabilities that come out of, for example, large
language models
00:09:00 which are the predominant paradigm today. We can do classic natural language processing
tasks like classification,
00:09:08 like extracting entities, extracting keywords out of data. We can summarize text.
00:09:14 We can search large volumes of data and provide answers to questions. We can create all
sorts of content like music, videos.
00:09:25 I even saw a rap song that was created by artificial intelligence recently, and it can even
generate computer code.
00:09:34 So, how did we get here? As I mentioned in Unit 1, there are many different kinds
00:09:40 of algorithmic approaches to AI. All sorts of different architectures in a neural network.
00:09:47 This is how we build the neural network. There was a paper published back in 2017, which
introduced
00:09:54 the attention-based transformer architecture. At the time, it seemed interesting, and companies

00:10:01 started to experiment with this. So, one of the first models produced out of that was GPT,
00:10:06 and it was very small, so it had around 100 million of these so-called parameters, and it had
some interesting properties.
00:10:15 And over the years, they started scaling this. So they used more and more training data.
00:10:20 The models got bigger and bigger and bigger. So GPT-2 was introduced in 2019 with 1.6
billion parameters,
00:10:27 again, very interesting. GPT-3, which is the model behind ChatGPT that everyone is so
familiar with,
00:10:35 was actually released in 2020, and that had 175 billion parameters. And suddenly we had
these emergent capabilities
00:10:44 coming out of the model. So it could help explain math problems, for example,
00:10:49 that it couldn't do in the past. It could do sentiment classification with just a little bit of input
00:10:56 and guidance. And that seems to be the trend.
00:11:00 So these models are getting bigger and bigger. There are families of models.

8 / 19
00:11:05 So all sorts of different companies now are creating foundation models. There are open source
versions available.
00:11:12 They tend to get bigger and bigger as well. GPT-4, which was released in the middle of this
year, which is now mostly running
00:11:19 under ChatGPT, is rumored to be at over one trillion parameters.
00:11:25 And as I said, it seems that the bigger the models get, more compute, more data, more
parameters, they tend to be
00:11:32 more accurate, you can adapt them to more things, and they simply just perform better.
00:11:38 Today, I'm going to talk about what's the trend, but there's also a trend towards smaller
models, benchmarking, so we compare the performance of these
00:11:47 on certain tasks, narrow tasks, as well as some of the bigger ones. What are some examples?

00:11:55 So, as I mentioned, generative AI processes prompts. A prompt is simply instructions or cues
or data that you're sending to the model,
00:12:07 and that's guiding it to produce the desired output. A simple example from a text-to-image
multimodal model.
00:12:17 I've here given the model a prompt, so I've simply typed in natural language and I've done
nothing else.
00:12:23 Hasso Plattner, the founder of SAP, giving a high five to a robot. And the model has produced
several images for me based on that prompt.
00:12:35 So here we see someone who looks like Hasso giving a high five to a robot.
00:12:39 In the other picture, it's kind of gotten it wrong, so I have to ease the robot, giving it a high five.

00:12:45 But I think you get the idea. Large language models are also good at creating text.
00:12:52 So for example, we can ask it to write an email introducing our logistics company called
Treasure Island Transports to a retail prospect.
00:13:01 So I want to sell to this person, and because we're Treasure Island Transports, I wanted to
write it in the style of a pirate.
00:13:09 So how a pirate would speak, and keep it under a certain number of words, and voila, there we
go.
00:13:16 We have an email. I'll deploy you there.
00:13:18 We'd be Treasure Island Transports, Masters of the Seven Seas, etc. introducing our company
to this prospect.
00:13:27 And they're also good at summarizing text and information and actually getting insights out of
large volumes of data.
00:13:35 So, for example, I've taken a thousand-word document here on SAP artificial intelligence, and
I've asked it to summarize something
00:13:42 out of that document. So I want to understand the key benefits that were listed there, it there,
00:13:47 and I want it to do it very concisely. So I'm asking the model, I'm instructing it to do this
00:13:51 in less than 100 words and to use bullet points for the answer. This is the prompt that I've sent
it, along with all of the text
00:13:58 from the document. And voila, there we go.
00:14:03 So it's describing the benefits of SAP Business AI. So, you know, we embed it, we can use it to
extend systems, etc.
00:14:17 They're also good at kind of classic NLP tasks. So we can use this to classify and extract text.

00:14:23 So, for example, let's say I want to do some sentiment analysis and understand the feedback
on the course, or I'm in a company and we
00:14:32 get service tickets and comments and I want to understand the sentiment in these, I'm asking it
to classify the sentence as negative, neutral,

9 / 19
00:14:42 or positive sentiment, and I want to extract the object of what they're talking about from that
short sentence.
00:14:50 And what it returns is, this is positive in sentiment because it said, I love your course.
00:14:57 And the object being referred to in the sentence is your course on OpenSAP, which is
otherwise known as Named Entity Recognition.
00:15:08 And, importantly for all the developers here, this seems to work very, very well with creating
code, either working code or code that needs
00:15:18 developers to make it ready for production, based on simple text descriptions.
00:15:23 So you can describe what you want a system to do, and what language you want that to write
it in, and it will produce code that can be used.
00:15:32 Good. So these straightforward yet impressive examples
00:15:37 are simply prompting generic generative I models. In the next unit, we'll learn how to adapt
these models to business context
00:15:47 and also a little bit about the limitations of generative AI models and how we overcome them.
00:15:53 So thank you very much.

10 / 19
Unit 3

00:00:06 Thank you for joining Unit 3 of this OpenSAP course which covers how we adapt generative AI
to business context.
00:00:14 In the last unit, we saw some impressive examples of what generative AI is capable of.
00:00:19 In this unit, we will learn about some of the limitations of generative AI models,
00:00:23 and we will learn about methods to address these limitations by grounding generative AI in
context and adapting it to business context,
00:00:31 And we will end with best practices when adapting generative AI. So generative AI's ability to
answer broad general knowledge questions
00:00:41 is very remarkable. We've seen, for example, it passed the bar exam in the United
00:00:46 States, which you need to become a lawyer and often score better than trained lawyers.
00:00:51 We've seen it able to pass the SAT exam in the United States, which is used for school
entrance exams,
00:00:58 and it's even passed the medical licensing exam, which is incredibly
00:01:02 impressive for our language model. And because of that, people may have the impression that
it
00:01:07 can tackle any kind of problem out of the box. However, the goal of this unit is to help users be
aware of these
00:01:15 limitations, keep their expectations in check, and understand how to make Generative AI
Enterprise ready
00:01:21 so that it can handle business problems. Let's look at a few examples of some of these
limitations.
00:01:27 The first is called hallucination. So hallucination is where large language models
00:01:34 can generate plausible sounding yet false answers. And related to that, sometimes if you ask it
a question
00:01:42 in different languages, it might provide different facts back to you even.
00:01:47 For example, there's a research paper that I recommend everyone read about the challenges
and applications of large language models,
00:01:53 where the researchers show that they ask it basically to create some citations for papers, and
the first citation that it gives
00:02:04 is correct. It's a real citation.
00:02:06 So that is a fact. It is a true answer.
00:02:08 The next citation doesn't exist at all. So the model has just completely invented an academic
citation
00:02:15 that sounds real, but if you look up, if you try to find it, the article doesn't exist.
00:02:20 And the third example is the correct article but it's somehow invented or hallucinated the
authors of the paper.
00:02:28 So this makes it difficult to trust the output of large-language models. Relatedly, some lawyers
in the United States, in court,
00:02:37 used chat GPT and large-language models to generate their case, as well as referencing
certain cases
00:02:47 that they submitted to court and you know when the judge looked into these none of them
actually existed so the model had just hallucinated
00:02:54 and invented these facts they were fined and they learned a valuable lesson.
00:03:00 Another challenge and limitation is that the knowledge of a model is basically frozen in time
from when the model was trained so it
00:03:10 can be difficult to get up-to-date and specific knowledge into the model.

11 / 19
00:03:15 For example, I asked it, who won the NBA preseason game last night between the LA Lakers
and the Golden State Warriors?
00:03:23 So the model can't know this because the model was trained in 2021 and of course the time of
recording we're in 2023.
00:03:33 The first answer here is correct, so the model actually tells me as a user I can't know that.
00:03:39 I'm limited in my training data. The second example goes ahead and hallucinates a fake score.

00:03:45 So if you ask the same question to the model many times it keeps giving you different scores
which like sound
00:03:51 plausible but but are not real. Similarly to a business problem, similarly looking at a business
00:03:57 problem, we can ask what libraries and SDKs are available with SCP AI Core or with some of
our products.
00:04:04 And this one again correctly tells me that its training ended in 2021 in September, so it can't
know that,
00:04:13 but then it goes ahead and gives me a plausible sounding answer that is actually wrong.
00:04:18 And the ground truth is that we do offer SDKs, API clients, SDKs, et cetera, in AI Core, which
was actually launched
00:04:28 in October 12, 2021, a month after the model was even trained. it couldn't have known that, yet
it tried to to give me
00:04:35 a plausible sounding result. Also, users may be tempted to assume that LLMs work like
computers
00:04:43 or like calculators and can solve complex optimization problems or forecasting or things like
this.
00:04:50 You know, many of them are not so good at math, so they have inconsistent math abilities and
a limited notion
00:04:59 of time. So for example, I asked it a question here that I think most humans
00:05:04 would get right. I said Jane was elected class president, this is the mother,
00:05:09 in 1973 when she was 12 years old. Her daughter Jill was elected class president in 2012
when she was 13
00:05:17 years old. Who was older when she was elected class president?
00:05:21 Jane or Jill. And the correct answer, of course, is that when when elected Jill was 13
00:05:26 and Jane was 12, therefore Jill was older. When we ask three of the biggest leading large
language models
00:05:33 this question, they all get it wrong. So, and even the reasoning was strange.
00:05:39 So it says Jane was 12 years old, well in 2012 Jill was 13 years old, therefore Jane was older.

00:05:45 So it just doesn't get it right. And I think astute experts following this do realize that there
00:05:53 are some methods to improve the performance of large language models when dealing with
math,
00:05:59 which can be very, very impressive. So here I've asked it to solve a quadratic equation,
00:06:04 and I've done a little bit of prompt engineering here, so I've asked it to show its work and to list
the answer.
00:06:10 And pretty amazingly, it does get the answer correct. Since we don't trust implicitly the output
of a large language model
00:06:18 without checking it, without verifying it, I do check on online using a different tool and it did
00:06:24 in fact find the correct answer to the quadratic equation. Good, so now that we are aware of
some of the limitations
00:06:32 of large language models and generative AI, let's look at some of the methods to ground and
adapt this in business context.

12 / 19
00:06:40 So grounding is defined as providing additional information in context that may not be available
as part of the foundation models
00:06:47 initially training data. So grounding can help reduce hallucination, it can help provide
00:06:53 up-to-date results, and improve the performance of the model on specific tasks.
00:07:01 So remember to create a generative AI foundation model, we've essentially taken this huge
amount of data, right,
00:07:07 probably you know a lot of it scraped from the internet, we have this huge model with billions,
even trillions
00:07:12 of parameters inside of it, and we have some limitations. And let's say we want to now apply
that to a specific use case
00:07:19 or a business context. What can we do?
00:07:21 How do we ground and adapt these? What are the methods?
00:07:24 So there are two big buckets for how we do this. The first is providing task specific instructions.

00:07:31 So this is passing information in the prompt to the model to improve its performance.
00:07:38 Because as we remember from the last unit, large language models and generative AI models
are stateless.
00:07:44 So they don't remember from one one time they're called to the next. The first bucket, the first
approach is called prompt engineering.
00:07:53 So here we're providing more information to describe the task.
00:07:58 These include different approaches like zero-shot and few-shot learning. We will cover each of
these in more detail in the rest of this unit.
00:08:09 The second approach is called retrieval augmented generation or RAG.
00:08:14 So here, we allow the large-diameter model to access external information, and we extend the
external,
00:08:21 we extend the domain knowledge by injecting information via various techniques, such as
embeddings, knowledge graph,
00:08:29 search. And importantly, this can actually provide references back to where it
00:08:34 got the information. So it'll provide a source on why it's giving the output
00:08:39 that it's given. And the third one, which is very, very exciting,
00:08:44 a little more experimental, these are called orchestration tools. So these are basically providing
tools to a large language model
00:08:52 so that it can perform various tasks, it can access external information, it can do things like
calculations, and we'll look into that
00:08:59 in more detail. The other big bucket is actually going back and retraining
00:09:06 and fine-tuning the model itself. So remember, we have this huge model full of many, many
parameters
00:09:12 inside of it. And we're going to actually retrain the model.
00:09:15 So we're going to adjust those weights in the model. Fine tuning is the way that we do this.
00:09:21 So it's a very good approach for improving the performance on domain-specific tasks or
through various instruction
00:09:31 tuning methods. Another one which we won't cover here is called reinforcement
00:09:37 learning with human feedback. This is how companies like OpenAI have fine-tuned chat GPT
00:09:45 to respond in a certain way. So basically all of the responses that it gives,
00:09:51 people are looking at those and then they're telling the model, yes, this was a good response
or no, this wasn't.
00:09:57 And then it's taking all of that feedback and going back to adjust the model to change how it
performs over time.

13 / 19
00:10:05 And these methods are useful, but they're not really enough to make generative AI ready to
put into an enterprise context and production
00:10:12 ready. There are additional processes and governance around the model,
00:10:16 of course, that are very important. One is AI ethics.
00:10:20 So basically don't put use cases into production that may cross certain ethical boundaries or
may be potentially harmful to humans
00:10:28 in the first place. human in the loop in the design process, so allowing humans to work
00:10:35 together with AI. So for example, if the output of a large language
00:10:39 model is, you know, not always consistent or it may have hallucinations, it's important to have
a human in
00:10:47 the process checking this. There are technical ways we can also do that, such as output
validation
00:10:54 and cross-checks. If for example, if the model is recommending
00:10:58 a certain purchase order number, and we can actually check that against the master data, is
that in the right format,
00:11:03 does that even exist. Also various testing and red teaming where basically humans
00:11:11 try to break the model before it goes into production. And of course having continuous
feedback
00:11:18 and monitoring it because over time sometimes the model behavior can change a little bit.
00:11:23 We want to check that you know in a month or in two months or six months from now that it's
still consistently giving
00:11:29 the the right kind of output. Let's look at the first of these in detail,
00:11:35 which is prompt engineering and where you should always start when trying to build a
generative AI use case.
00:11:42 So prompting, as we learned in the last module, is giving the model detailed information
00:11:48 to reliably produce the desired output. And the easiest way to do that is what is simply called
zero-shot
00:11:56 learning. So we We provide a very simple prompt to the model, and we get a result
00:12:01 out of it. So let's say, for example, we have a use case where we
00:12:04 want a large language model to create a job description for us. So it's a job posting, and we're
going to put it
00:12:11 on the internet. People will read that.
00:12:13 They will apply for the job. A simple way would simply be write a job description for a support
00:12:19 engineer. And you can try this if you have access to large language models.
00:12:24 It does a pretty good job at that out of the box, but probably not perfect, not exactly what you
need.
00:12:31 Now what we can do is add more information. We can try a prompt engineering technique
called instruction
00:12:36 following. So we're writing the command tense and we're telling the model
00:12:40 what it should do. So for example, write a job description, keep it less than 300
00:12:45 words, include the following skills which we had list, we give that as input.
00:12:50 It then takes it and it will produce a job description that is a little bit better.
00:12:56 And we could even take that a little bit further, and we could mix this, we could have
instructions with in-context learning,
00:13:02 including examples. So in an SAP system, for example, we would have this user input,
00:13:07 this user prompt, where they're asking to create a job description.
00:13:12 And in the backend, we can inject more information into the prompt, like job title, for example.
00:13:18 We can give instructions that, for example, it should ensure diversity and fairness,

14 / 19
00:13:22 use subtitles for required skills, make bullet points, etc. And we can even give examples of
similar job descriptions.
00:13:31 We said, we read that, we say, wow, that's a great job description. That's the style I want for
my company.
00:13:36 We can provide those simply into the prompt. Keep in mind there's no free lunch.
00:13:43 So we're adding more and more information to get more and more reliable, better outputs,
00:13:52 but the context window is getting bigger. So this means we're sending more and more data,
00:13:56 more and more tokens to the system. This typically increases the cost because you're paying,
the more data
00:14:02 you're sending to these models, typically it's consuming more GPUs in the background, and it
can increase things
00:14:09 like response time. But this is how we get the system to behave better.
00:14:15 Another example of prompt engineering is with SAP business documents. So for example, we
can use optical character
00:14:23 recognition, OCR, to extract text out of a scan, for example, or a PDF.
00:14:32 We pass that on to a large language model. And here we have the text preserved with the 2D
structure.
00:14:38 inject more information into the prompt. So we want to extract certain fields, for example, we
maybe give examples
00:14:47 of those fields. And what we get out of the large-language model,
00:14:52 because I've asked it using a command here to extract the fields as adjacent, is those fields
00:14:58 extracted in a machine-readable way from the document that we can now post into into an
ERP system or other business application.
00:15:08 And of course, you know, it doesn't always get it right.
00:15:12 We learned that machine learning and generative AI is always probabilistic, so we have some
kind of output validation.
00:15:17 Maybe the ID number is in the wrong format, or it's done something that it shouldn't expect,
and we can actually
00:15:23 verify that in the back-end system. The next adaptation approach that we'll look at is
00:15:30 retrieval augmented generation, or RAG, and we'll use the example of embeddings.
00:15:36 So first, what are embeddings when you see this? So embeddings are simply numerical
representations of data
00:15:45 of information that retain the semantic and contextual meaning.
00:15:49 What does that mean? Let's take an example.
00:15:51 So let's say we have product documentation and we want to make that product documentation
accessible by large language models
00:15:59 and generative AI so that users can search it and they can interact with it and get insights and
information out of that.
00:16:05 So we take the product documentation and we first encode it as embeddings using some kind
of machine learning algorithm.
00:16:14 And then we store this in a vector database. What is a vector?
00:16:18 So we have a sentence like configure business processes by opening da-da-da, and we turn
that into a vector.
00:16:25 It's simply a big row of numbers, okay, that is understood by a machine but not really by
people.
00:16:33 And this vector, these numbers, capture the meaning in the text.
00:16:37 So for example, the word Apple has a different vector, there's different numbers in the row.
00:16:43 In the sentence, Apple makes phones, where it's the subject of the sentence, versus I have an
Apple iPhone 15.

15 / 19
00:16:50 So those two vectors are probably similar but slightly different. Versus a completely different
vector, Apple is an ingredient in pies.
00:16:58 So this will be completely different. And the beauty of this is once we represent all of this
business data
00:17:04 as embeddings, it can be easily searched and retrieved using techniques like vector similarity
scoring.
00:17:10 So we can simply mathematically compare what is the user asking for versus what is the what
information are we looking to pull
00:17:17 out of the system. So how would we apply that?
00:17:22 So we've already said that we have this business documents that should be searchable by the
users,
00:17:28 they should be able to ask questions to it, and what we've done is we've embedded this
knowledge and then we
00:17:35 can retrieve the relevant items given a question using large language models and other
foundation models, and we can then generate an answer
00:17:43 from the best results again using large language models. So for example a user might ask,
how can I adapt my business processes
00:17:51 for VAT calculation. So that sentence is first turned into an embedding model.
00:17:57 We're then using that, using these techniques to retrieve the relevant results out of the
backend,
00:18:03 which are also vectorized as an embedding. And we take all this information and we inject it
into the prompt
00:18:11 and give that back to the large-diameter model, which is then going to generate answer for us.

00:18:16 So you know, you need to configure your business process and add a step by whatever that is.

00:18:22 And the beauty of this is that it grounds the prompt with relevant information, so it's telling the
prompt,
00:18:29 create the answer, not from what you know, what you've been trained on, but create the
answer from this text
00:18:35 here, and it can even provide references back to the source material that were used to
generate this.
00:18:41 So you can actually look into the documentation and see where did it get this answer from so
that I know I can trust it.
00:18:49 The other way to provide these task-specific instructions is an emerging technique called
agents
00:18:59 or orchestration tools. So these basically give large language models access
00:19:05 to tools, like the ability to access an API, or various libraries that provide a reasoning structure
for the model.
00:19:15 And we allow these so-called agents to work together and find an answer. For example, the
user can input some kind of task, so asking it to do
00:19:26 something. And the agent will basically build a plan.
00:19:34 So it's going to retrieve, for example, in this case, API specifications based on what the user's
input.
00:19:43 It will then generate a plan for how it's going to extract the information out of these.
00:19:47 And then it goes ahead and calls the API, collects the information, and presents the
information back to the user.
00:19:54 So again, this is pretty advanced. So you're basically giving the model a description of an API.

16 / 19
00:20:03 it knows which one to use from that, it can write its own JSON file, it can call the information
and pass the information
00:20:09 back to the users. And the last one we'll cover is fine-tuning.
00:20:16 So as I said, this is adjusting the existing foundation model's parameters, the weights, to
perform better at a specific task
00:20:24 by retraining it on a new data set. So we have our big pre-trained foundation model as a
starting point.
00:20:32 And now we're going to retrain that using curated domain or task-specific data, typically
labeled with examples
00:20:42 of the desired input and output. So we've now fine-tuned this model.
00:20:47 And what we find is that can improve the performance on certain tasks. Most people, when
they look at large average
00:20:56 models, they tend immediately to think I need to fine-tune a model.
00:21:02 I think there are considerations about whether that's actually the first thing you should do or
not.
00:21:05 I think that's the last thing. So the opportunity of fine-tuning is, again, it can improve the
accuracy
00:21:12 and performance on certain kinds of tasks. So for example, you know, extracting information
00:21:17 from business documents, we can fine-tune that on labeled examples of information we've
extracted
00:21:23 from business documents and it will improve it at that specific task.
00:21:27 And even these smaller and medium-sized models, you know, even 5 to like 50, 60, 70 billion
parameter models,
00:21:35 can re-trick see the performance of the more costly massive models using this technique.
00:21:43 However, this may reduce but not eliminate problems like hallucination.
00:21:47 The model may not necessarily forget the existing biases and information that it already had.
00:21:54 It does not reliably learn new factual information, so it's not going to learn new facts by doing
this.
00:21:59 For facts, you should turn to techniques like rags or orchestration tools.
00:22:05 You need a lot of curated training data to fine-tune. It can be quite expensive and costly to
retrain them,
00:22:12 and currently it's difficult to generalize the volume of required training data that you need to
achieve
00:22:18 meaningful results, meaning this is a lot of experimentation.
00:22:23 So just as a point here, consider and test task-specific instructions.
00:22:30 So there's prompt engineering before jumping ahead and investing in fine tuning models.
00:22:37 So to wrap up, some recommendations on adapting foundation models. Do not trust generic AI
models to answer factual questions,
00:22:47 especially in a business context. Instead, be aware of those limitations
00:22:56 and apply these grounding and adaptation techniques to ensure that the output is relevant and
reliable.
00:23:04 Do not start immediately with fine-tuning, as we just discussed.
00:23:07 That's a very special use case. Instead, start
00:23:10 with task-specific instructions, like prompt engineering and retrieval augmented generation.
00:23:16 Do not use the biggest model by default. So these huge models like GPT-4 for example
currently are very very
00:23:23 impressive they work very well. They can also be costly and they may not actually be the the
best one
00:23:31 to use for that use case. So instead test and adapt different generative AI

17 / 19
00:23:35 models and optimize based on price and performance. And do not productize a technically
functional generative use case
00:23:43 without these established processes like design with human in the loop, like having AI ethics in
place.
00:23:50 So ensure that you have proper governance and design for generative AI.
00:23:55 So this concludes the course which introduces the fundamentals of generative AI.
00:24:00 Thank you.

18 / 19
© 2023 SAP SE or an SAP affiliate company. All rights reserved.
See Legal Notice on www.sap.com/legal-notice for use terms,
disclaimers, disclosures, or restrictions related to SAP Materials
for general audiences.

You might also like