Generative AI Lifecycle Patterns. Part 2_ Maturing GenAI _ Patterns… _ by Ali Arsanjani _ Sep, 2023 _ Medium

9/23/23, 6:45 PM Generative AI Lifecycle Patterns.
Part 2: Maturing GenAI : Patterns… | by Ali Arsanjani | Sep, 2023 | Medium
Open in app
Search Medium
Generative AI Lifecycle Patterns

Ali Arsanjani · Follow
17 min read · 4 days ago
Listen Share More
Part 1: Enterprise GenAI : Patterns, Cycles and Strategies for Deriving

Business Value
In this article we will explore some optional paths to increased enterprise and
production scale maturity in the journey of adopting Generative AI. How do we scale
our applications from research to prototype to production? Consider these patterns as
maturing the development process to move towards enterprise scale production.
We will explore a non-exhaustive list of techniques that are often combined to make
composite patterns of how you deal with typical problems and challenges encountered as
you seek to adopt Gen AI at the Enterprise level.
You can use this as a checklist for patterns to adopt for using Gen AI at a production scale
in an enterprise or industrial environment. Also you can use this to prepare your
enterprise for Gen AI through awareness of some of the many skills you may need to
overcome common challenges in the journey of its adoption.
Fundamentally ML is about creating or selecting a model, seeing how it performs

on some data, as it attempts to predict or generate some downstream task. But this
process is highly experimental, iterative; like the ML algorithms themselves that use
backpropagation to converge on a better set of weights that decrease the loss
function for the down sream task.
https://dr-arsanjani.medium.com/the-generative-ai-lifecycle-1b0c7d9463ec 1/24
9/23/23, 6:45 PM Generative AI Lifecycle Patterns. Part 2: Maturing GenAI : Patterns… | by Ali Arsanjani | Sep, 2023 | Medium
Iterations and Cycles. Langchain has been a highly popular and useful library to
help create a chain of tasks for Gen AI. But these chains are not one and done, they
are fundamentally an experiment and so we must prepare ourselves, our teams and
our enterprises for cycles of these chains and iterate on our experiments as we cycle
through these chains of tasks.
Let’s explore some frequently encountered solutions to problems in context:

patterns. The patterns we will cover here are more iteration patterns or cycle
patterns: iterations or “cycles” in your chain of tasks. For example the chain that
starts with you prompting a language model to obtain a completion or “get an
answer” from an LLM (large Language Model). It’s important that we establish a
cycle and evaluate the outcome and iterate. I’ve set it up as a set of increasingly
mature adoption of more complex and sophisticated strategies and patterns for
fulfilling tasks given to the LLM.
Below is a diagram that includes almost all of the cycles and iterations we discuss in
this article. Please use it as an indicative reference for the art of the possible, and
adapt / add as you need to accommodate nuances of your specific enterprise needs.
Figure 1: Generative AI Life-cycle Patterns
They collectively represent more sophisticated or more mature ways of handling

complexities of generative AI for the Enterprise.
Each of the subsequent “cycles” comprise

prompt → FM → completion
expanded into
prompt → Tuned Model → Completion,
prompt → RAG | FLARE→ Model → Completion,
prompt → Model → Completion → Grounding ,
prompt → ToT → Model → ToT → Completion steps.
For example let’s start with a simple one:
Prompt — — -> Foundation Model — — → Adaptation — — → Completion

(“Translate to French”) (Initial Translation) (Validation) (“Bonjour, comment ça va?”)
Maturity Level 1. Prompt, In-context Learning and Chaining.
Prompt it & TICL it (Textual In-Context Learning).

Select a model, prompt it, get a response, assess the response, re-prompt till your
responses cumulatively give you what you want, a product description, a summary
with a certain format, a SQL statement that runs, python code that is generated, etc.
In-context learning is a method of prompt engineering that allows language models

to learn tasks from a few examples. In this method, a model is given a prompt with
examples of a task in natural language. The model learns to solve the task without
any change to its weights. ICL has become a new paradigm for NLP.
ICL has very similar goals to few-shot learning: to enable models to learn contextual
data without extensive tuning. However, fine-tuning a model involves a supervised
learning setup on a target dataset. In ICL, a model is prompted with a series of
input–label pairs without updating the model’s parameters.
Experiences have shown that LLMs can perform quite an array of complex tasks
through ICL, even as complex as solving mathematical reasoning problems [1]
Chain it.
Beyond the basic Prompt → FM → Adapt → Completion pattern, we typically need to
extract data from somewhere, maybe run a predcitive ai algoritjm and then send the
results to a generative AI foundational model. This Chain of TAsks (CoTA, to be
distinguished from Chain of Thought, CoT) pattern is exemplified as:
Chain : extract Data/ Analytics → Run a Predictive [set of] ML Model[s] → Send the
Result to an LLM → Generate an output
Example: Marketing Activation. Start by running a SQL statement against, say

BigQuery to get the segments of customers you wish to reach out to in a marketing
campaign. Next, run a Predictive AI ranking algorthm to get the top n customers in
the segments, or get the top set of micro-segments. Next send the data of top
segments or top customers in a segment to the LLM for a generative AI step that will
produce a personalized marketing social media post or email to those segment of
customers, increasing the chances they will be more responsive with a personalized
reach out.
You can use a library like LangChain to accomplish may of this chain of tasks.
LangChain includes Models, Chains and Agents.
Models. LangChain supports a variety of LLMs, including Google Vertex AI,

OpenAI and Hugging Face models.
Chains. Chains are sequences of operations that LangChain can perform on text
or other data. Chains can be used to perform tasks such as text analysis,
summarization, and translation.
Agents. Agents are programs that use LLMs to make decisions and take actions.
Agents can be used to build applications such as chatbots and code analysis
tools.
LangChain also provides integrations with other tools and APIs as well as end to end
chains of tasks needed to complete a workflow. For example:
Integrations with other tools: LangChain can be integrated with other tools,
such as Google Search and Python REPL, to extend its capabilities.
End-to-end chains for common applications: LangChain provides pre-built

chains for common applications, such as document analysis and
summarization.
LangChain agents are particularly powerful because they can use LLMs to make
decisions and take actions in a dynamic and data-driven way. For example, a
LangChain agent could be used to build a chatbot that can learn from its
interactions with users and improve its performance over time.
LangChain can be used for a variety of use-case. For example:
Document analysis and summarization: LangChain can be used to analyze and

summarize documents, such as legal documents or scientific papers.
Chatbots: LangChain can be used to build chatbots that can interact with users
in a natural and informative way.
Code analysis: LangChain can be used to analyze code and identify potential
bugs or security vulnerabilities.
Overall, LangChain is a powerful framework that can be used to build a wide variety
of applications using LLMs. It is particularly well-suited for building dynamic and
data-responsive applications.
LangChain agents use an LLM to decide what actions to take and the order to take
them in. They make future decisions by observing the outcome of prior actions.
This allows LangChain agents to learn and adapt over time, becoming more
effective at completing tasks.
LangChain agents can be used to build a variety of applications, such as chatbots,

code analysis tools, and customer service assistants. They are particularly well-
suited for tasks that require reasoning, planning, and decision-making.
Maturity Level 2. The above section is a very typical set of patterns used in an iterative
cycle to leverage Generative AI. Now let’s explore a more mature level that augments the
above.
Tune it.
As you evaluate the model response you find it wanting even after substantial effort
in prompt engineering and In context-learning. Here you may need to tune the
foundation model: adapt it to a domain, an industry, a type of output format , a
certain brevity vs rambling output (e.g., as in classification of a set of symptoms).
Parameter-efficient fine-tuning (PEFT) is a technique for fine-tuning LLMs that is

less computationally expensive than traditional fine-tuning. PEFT works by fine-
tuning only a subset of the LLM’s parameters. This can be done by using a technique
called adaptor tuning, or by using a technique called LoRA (Low-Rank Adaptation of
Large Language Models).
Adaptor tuning involves adding a new layer to the LLM that is specific to the task at
hand. The new layer is trained on a small dataset of labeled examples. This allows
the LLM to learn the specific features of the task without having to fine-tune all of
its parameters.
LoRA involves approximating the LLM’s parameters with a low-rank matrix. This
can be done by using a technique called matrix factorization. The low-rank matrix is
then fine-tuned on a small dataset of labeled examples. This allows the LLM to learn
the specific features of the task without having to fine-tune all of its parameters.
Full fine-tuning is the traditional approach to fine-tuning LLMs. In full fine-tuning,

all of the LLM’s parameters are fine-tuned on a large dataset of labeled examples.
This can be computationally expensive, but it can lead to the best performance on
the target task.
This allows the very important introduction of Domain specific LLMs. For example,
see how Vertex AI can do this for you at a nominal cost.
Open with Colab
Open with GitHub
Open with Vertex AI Workbench
Reinforcement Learning from Human Feedback (RLHF) can be used to further

enhance the fine tuning. More on that in Part 2.
Maturity Level 3. Now let’s retrieve data before we send the prompt and contextify the
input even more, decreasing the likelihood of hallucination by the LLM.
RAG it.
Access similar documents using semantic search. How is this done? A set of
documents you supply are chunked (read ‘split’) up (sentence by sentence or by
paragraph, or by page, etc.) then converted into an embedding with a Vector
Embedding like textembedding-gecko@latest and then stored in a Vector Database
such as Google’s Vertex Vector Search. The retrieval is done via an Approximate
Nearest Neighbor search (ANN) aka semantic search algorithm. This input may
significantly decrease the possibility of the model’s hallucination and provide the
model with enough relevant context so as to be more knowledgeable about the topic
and return more ‘sensible’ and relevant completions. This process is known as
Retrieval Augmented Generation or RAG. So RAG it.
RAG works by:
1. Creating an initial prompt from the user’s query or statement.
2. Augmenting the prompt with context retrieved from the Vector Store.
3. Sending the augmented prompt to the LLM.
Ground it.
Use an expanded search capability to increase the factual grounding by
allowing/requesting the model to return a reference to where it found the responses
it just gave. RAG does provide grounding, prior the submission to the LLM.
Grounding is after the model issues the output tokens, find a citation and send it
back. Many vendors such as Google Cloud AI provide multiple ways of Factual
Grounding.
Note: Factual Grounding vs RAG

Factual grounding and RAG are both approaches to improving the accuracy and
relevance of LLMs. However, they have different goals and use different techniques.
Factual grounding is the process of ensuring that an LLM’s generated text is

consistent with factual knowledge. This can be done by providing the LLM with
access to a knowledge base of factual statements, and by training the LLM to
generate text that is consistent with these statements.
RAG is a framework for augmenting LLMs with access to external knowledge bases.
This allows LLMs to generate more accurate and informative text, even on complex
and challenging tasks. RAG works by first retrieving relevant passages from the
knowledge base. The LLM then uses these passages to generate its response.
The main difference between factual grounding and RAG is that factual grounding
focuses on ensuring that the LLM’s generated text is consistent with factual
knowledge, while RAG focuses on generating more accurate and informative text.
Also factual grounding typically uses a knowledge base of factual statements, while
RAG can use any type of external knowledge base, including text documents, code
repositories, and databases.
Factual grounding is typically used as a pre-training step, while RAG can be used as
a post-training step. This means that factual grounding is typically used to improve
the accuracy and relevance of LLMs on a variety of tasks, while RAG is typically
used to improve the accuracy and relevance of LLMs on specific tasks.
FLARE it.
Forward-looking Active Retrieval Augmented Generation. FLARE is a variation of
RAG in which you actively decide when and what to retrieve using a prediction of
the upcoming sentence to anticipate future content and utilize it as the query to retrieve
relevant documents when you evaluate that the retrieved docs contain low-
confidence tokens.
Maturity level 4. We are getting into a very interesting domain here where you can start to
as your LLM for how it is reasoning and what are the steps in accomplishing its task.
CoT it or ToT it. GoT it?
use a prompt to derive the Chain/Tree/Graph of Thought output as a set of steps.

Each step can use RAG actively to pull from docs in a Vector DB. ToT maintains a
tree of thoughts, where thoughts represent coherent language sequences that are
the reflection of how the LLM is “thinking” about the set of intermediate steps that
it would use to solve your problem . This approach enables an LLM to self-evaluate
the progress intermediate steps/thoughts make towards solving a problem through a
deliberate reasoning process.
The Tree of Thoughts (ToT) framework is a new approach to AI reasoning. It’s

different from the Chain of Thought (CoT) approach, which guides language models
along a single path. In a CoT diagram, each sentence is a direct continuation of the
previous one. In a ToT diagram, the main idea branches off into several related
ideas.
In a ToT diagram, each node is a “thought”. A thought is a coherent chunk of text

that represents an intermediate reasoning step. This allows the language model to
explore multiple reasoning paths and evaluate the progress of different thoughts
towards solving the problem.
In a CoT diagram, each sentence is a direct continuation of the previous one. This
forms a linear chain.
Tree of Thoughts allows multiple step analysis, multiple comparisons, and

increased options after each step. It also allows the system to restart at the first or
earlier steps to look again for new options.
OK you get the idea for the Graph of Thought. GoT it?
Graph of Thought (GoT) is a framework that models the reasoning process of large
language models (LLMs) as a graph. In a Chain of Thought, each sentence is a direct
continuation of the previous one, forming a linear chain. In a Tree of Thought, the
main idea branches off into several related ideas.
GoT allows for dynamic data flow without a fixed sequence. This flexibility is
important in AI, where data can come from multiple sources and may need to be
processed non-linearly.
GoT models each thought generated by an LLM as a node within a graph.

Dependencies are represented by vertices that connect these nodes. This allows
prompting the LLM to solve problems through networked, non-linear reasoning.
This strategy can be considered to be a generalization of Chain of Thoughts and

Tree-of-Thoughts. In addition, it provides more flexibility for example, by refining a
single thought and aggregating multiple thoughts together.
Chain it.
Figure 2: Contrasting the Various X of Thought Prompt engineering paradigms
Let’s explain this table.
The Chain of Thought model does very well in its simplicity, intuitiveness, and ease
of training. It follows a linear, step-by-step process that is good for tasks naturally
aligned with sequential logic. This imposes the limitations on the model’s ability to
handle complex reasoning tasks that may very well require considering multiple
variables or alternative options or outcomes. Once it sets “its mind” on a particular
‘chain,’ the model may find it challenging to backtrack or explore other avenues,
which may lead to less than optimal outcomes.
The Tree of Thought model is characterized by its ability to represent complex

reasoning in a hierarchical manner. This structure enables it to tackle multi-faceted
problems by branching out into sub-problems or conditions. Its structured
approach also makes it relatively easier to interpret compared to more complex
models. This additional complexity is at the cost of higher computational needs and
a risk of possibly overfitting. Its branching can make it harder to trace the model’s
exact reasoning path, not a great help when it comes to its interpretability.
The Graph of Thought model stands out for its ability to handle high-complexity
tasks involving multiple interconnected variables. Its flexibility allows it to model
non-linear and interconnected relationships, making it highly suitable for real-
world problems with complex, interrelated variables. However, this complexity
demands significant computational resources and sophisticated algorithms for
effective training. The Graph of Thought model is also the most challenging to
interpret; its non-linear interconnected structure doesn’t lend itself to
straightforward explanations, making it difficult to understand the reasoning
behind its decisions and use it for explainability.
ReAct — Plan it.

Don’t just get data and pass to the LLM so it doesn’t look stupid and make things up
when it doesn’t know your domain (RAG), but actually let the LLM make an external
call via an API, for example, to retrieve information as it is deciding on — reasoning
— what to tell you about your prompt. ReAct is a method that combines acting and
reasoning to help LLMs “learn” new tasks and make decisions. Using this technique
or strategy we prompt LLMs to generate verbal reasoning traces and actions for a
task. This allows the model to perform dynamic reasoning to create, maintain, and
adjust high-level plans for acting.
ReAct is designed for tasks in which the LLM is allowed to perform certain actions.
For example, a LLM may be able to interact with external APIs to retrieve
information. It addresses issues that LLMs sometimes face, like producing incorrect
facts and compounding errors.
Conclusion
This article reviewed a set of patterns including combining techniques commonly
encountered as you cycle through deriving business value from Gen AI: seek to
make Gen AI Enterprise ready and conversely as you seek to mature the Enterprise
from prototype to product for and with Gen AI.
Topics and References

I have covered or alluded to the following topics in this article.
[Textual] In-Context Learning (TICL). Okay, I made the [textual] word up to

rhyme and make it a semantic verb aligned with the rest. guilty :-) !
Few-shot learning
Supervised learning
Retrieval Augmented Generation (RAG)
FLARE, a form of Active RAG.
Factual grounding
Chain of Thought (CoT)
Tree of Thoughts (ToT)
Graph of Thought (GoT)
ReAct: Reasoning & Action.
Here are some specific references that you may find helpful.
[1] LLMs can perform complex tasks through ICL, even as complex as solving
mathematical reasoning problems:
Learning to Perform Complex Tasks through Compositional Fine-Tuning of

Language Models
This paper shows that LLMs can be fine-tuned to perform complex tasks with only a
few examples, using a technique called chain of thought prompting. The authors
demonstrate that LLMs can be used to solve mathematical reasoning problems,
translate languages, and perform other complex tasks with high accuracy.
A Survey on In-context Learning, Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng,
Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, Zhifang Sui.
This survey paper provides a comprehensive overview of ICL for LLMs. The authors
discuss the different ways in which ICL can be used to train LLMs to perform new
tasks, and they provide examples of ICL being used to solve complex tasks such as
mathematical reasoning and code generation.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason

Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed
Chi, Quoc Le, Denny Zhou
This paper introduces the chain of thought prompting technique for training LLMs
to perform complex tasks. The authors demonstrate that LLMs trained with chain of
thought prompting can solve mathematical reasoning problems, even when the
problems are presented in a new format. They “explore how generating a chain of
thought — a series of intermediate reasoning steps — significantly improves the
ability of large language models to perform complex reasoning.” In particular, they
show how such reasoning capabilities are an emergent behavior that surfaces “
naturally in sufficiently large language models …, where a few chain of thought
demonstrations are provided as exemplars in prompting. Experiments on three
large language models show that chain of thought prompting improves
performance on a range of arithmetic, commonsense, and symbolic reasoning
tasks.”
[2] (Textual) In-Context Learning (TICL)
Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint
arXiv:2005.14165 (2020).
Raffel, Colin, et al. “Exploring the limits of transfer learning with a unified text-
to-text transformer.” arXiv preprint arXiv:1910.10683 (2019).
[3] Few-shot learning
Prototypical networks are a type of prototype classifier that is used for few-shot
learning. Few-shot learning is a classification technique that uses a small dataset to
adapt to a specific task. Prototypical networks are based on the idea that each class
can be represented by the mean of its examples in a representation space learned
by a neural network.
Snell, Jake, Kevin Swersky, and Samy Bengio. “Prototypical networks for few-
shot learning.” arXiv preprint arXiv:1703.05175 (2017).
Authors propose a model-agnostic algorithm for meta-learning: it is compatible

with any model trained with gradient descent and applicable to a variety of different
learning problems, including classification, regression, and reinforcement
learning. They propose a meta-learning algorithm that works with any gradient-
trained model and can be applied to a variety of learning tasks, including
classification, regression, and reinforcement learning.
The goal of meta-learning is to train a model on a variety of tasks so that it can

quickly learn new tasks with only a few training examples. In this approach, the
model parameters are trained to be easy to fine-tune with a small number of
gradient steps on a new task.
They demonstrate that their approach achieves state-of-the-art results on two few-
shot image classification benchmarks, performs well on few-shot regression, and
accelerates fine-tuning for policy gradient reinforcement learning with neural
network policies.
Finn, Chelsea, et al. “Model-agnostic meta-learning for fast adaptation of deep

networks.”
[4] Supervised learning.
Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The elements of

statistical learning: data mining, inference, and prediction. Springer Science &
Business Media, 2009.
[5] Retrieval Augmented Generation (RAG)
The paper on retrieval augmented generation (RAG) was written by Patrick Lewis, et
al. RAG is a framework for augmenting large language models (LLMs) with access to
external knowledge bases. This allows LLMs to generate more accurate and
informative text, even on complex and challenging tasks.
RAG has been shown to be effective for a variety of tasks, including question
answering, summarization, and translation. It is a promising new approach to
generative AI, and it has the potential to revolutionize the way we interact with
computers.
Lewis, Patrick, et al. “Retrieval-augmented generation for text summarization.”
Fan, Angela, et al. “RAG: Retrieval augmented generation for knowledge-

intensive NLP tasks.”
[6] Factual grounding
Yi Tay, et al.“Check Your Facts and Try Again: Improving Large Language Models
with External Knowledge and Automated Feedback”. This paper proposes a
method for improving the factual accuracy of LLMs by providing them with
feedback on their generated text. The feedback is based on a knowledge base of
factual statements.
Wang, Xuezhi, et al. “Chain of thought prompting elicits reasoning in large

language models.”
[7] Chain of Thought (CoT)
The paper explores how generating a chain of thought — a series of intermediate

reasoning steps — significantly improves the ability of large language models to
perform complex reasoning.
Authors propose chain-of-thought prompting technique, which enables large

language models to accurately perform complex reasoning tasks using only a few
intermediate reasoning steps without explicit training. Key insights and lessons
learned from the paper are that large language models can reason at a comparable
level to models that have been extensively trained and that the chain-of-thought
prompting technique is a simpler and more scalable mechanism for encoding
reasoning steps into language models.
Wei, Jason, et al. “Chain of thought prompting elicits reasoning in large

language models.”
[8] Tree of Thoughts (ToT)
Shunyu Yao, et al. “Tree of Thoughts: Deliberate Problem Solving with Large
Language Models.” arXiv preprint arXiv:2209.06289 (2022).
Jieyi Long. “Large Language Model Guided Tree-of-Thought”
[9] Graph of Thought (GoT)
Maciej Besta, et al. “Graph of Thoughts: Solving Elaborate Problems with Large
Language Models”
[10] ReAct: Reasoning and Action
Denny Zhou et al. “Least-to-Most Prompting Enables Complex Reasoning in

Large Language Models”
Shunyu Yao, et al. “ReAct: Synergizing Reasoning and Acting in Language

Models.” Also, See this official blog.
[11] Applications of chains of tasks.
Terence L Van Zyl. et al. “Machine Learning for Socially Responsible Portfolio
Optimisation”
Follow
Written by Ali Arsanjani

300 Followers
Director Google, AI/ML & GenAI| EX: WW Tech Leader, Chief Principal AI/ML Solution Architect, AWS | IBM
Distinguished Engineer and CTO Analytics & ML
More from Ali Arsanjani
Ali Arsanjani
The Generative AI Life-cycle

The common AI/ML Lifecycle consists of data collection, preparation, training, evaluation,
deployment and monitoring all encompassed with…
18 min read · Mar 21
403 3
Ali Arsanjani
6 Levels of ML Adoption
Ali Arsanjani, Edited, from a transcription and summary of a recent talk, in collaboration with
Joel Milag, Wow AI and Team
5 min read · Nov 5, 2022
124
Ali Arsanjani
How to Build and Run your Entire End-to-end ML Life-cycle with Scalable
Components
End-to-end Enterprise Scale MLOps Projects: An Overview
18 min read · Nov 29, 2021
23
Ali Arsanjani
Build an MLOps end-to-end NLP Pipeline to understand trends in

company valuation
Using Amazon SageMaker Pipelines, Amazon Jumpstart Industry SDK and HuggingFace
Transformers
9 min read · Dec 5, 2021
88
See all from Ali Arsanjani
Recommended from Medium
Sachin Kulkarni
Generative AI with Enterprise Data

Create business value add Enterprise knowledge to Large Language Models
6 min read · Jul 25
249 5
ai geek (wishesh)
Best Practices for Deploying Large Language Models (LLMs) in

Production
Large Language Models (LLMs) have revolutionized the field of natural language processing
and understanding, enabling a wide range of AI…
10 min read · Jun 26
43 1
Lists
Staff Picks
454 stories · 297 saves
Stories to Help You Level-Up at Work

Self-Improvement 101
Productivity 101
Rémi Toffoli
Building LLM-powered products — Part 1
Illustrated general architectural concepts: dynamic prompt, prompt chaining, question-

answering without hallucinations, vector database…
77 1
Dominik Polzer in Towards Data Science
All You Need to Know about Vector Databases and How to Use Them to
Augment Your LLM Apps
A Step-by-Step Guide to Discover and Harness the Power of Vector Databases
· 24 min read · 5 days ago
648 7
Liuyueguang
LLM Application (2/3): AI Product Management

The business and technology strategies to consider during AI product design. The product
development lifecycle of building an application…
22 1
Yash Bhaskar
Introduction to LLMs and the generative AI : Part 1- LLM Architecture,

Prompt Engineering and LLM…
Large language models (LLMs) have revolutionized the field of artificial intelligence (AI)
development, offering developers unprecedented…
261 1
See more recommendations

Generative AI Lifecycle Patterns. Part 2_ Maturing GenAI _ Patterns… _ by Ali Arsanjani _ Sep, 2023 _ Medium

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Generative AI Lifecycle Patterns. Part 2_ Maturing GenAI _ Patterns… _ by Ali Arsanjani _ Sep, 2023 _ Medium

Uploaded by

Copyright:

Available Formats

9/23/23, 6:45 PM Generative AI Lifecycle Patterns.

Part 2: Maturing GenAI : Patterns… | by Ali Arsanjani | Sep, 2023 | Medium

Generative AI Lifecycle Patterns

Listen Share More

Part 1: Enterprise GenAI : Patterns, Cycles and Strategies for Deriving

Fundamentally ML is about creating or selecting a model, seeing how it performs

Let’s explore some frequently encountered solutions to problems in context:

Figure 1: Generative AI Life-cycle Patterns

They collectively represent more sophisticated or more mature ways of handling

Each of the subsequent “cycles” comprise

For example let’s start with a simple one:

Prompt — — -> Foundation Model — — → Adaptation — — → Completion

Maturity Level 1. Prompt, In-context Learning and Chaining.

Prompt it & TICL it (Textual In-Context Learning).

In-context learning is a method of prompt engineering that allows language models

Example: Marketing Activation. Start by running a SQL statement against, say

Models. LangChain supports a variety of LLMs, including Google Vertex AI,

End-to-end chains for common applications: LangChain provides pre-built

LangChain can be used for a variety of use-case. For example:

Document analysis and summarization: LangChain can be used to analyze and

LangChain agents can be used to build a variety of applications, such as chatbots,

Parameter-efficient fine-tuning (PEFT) is a technique for fine-tuning LLMs that is

Full fine-tuning is the traditional approach to fine-tuning LLMs. In full fine-tuning,

Open with Colab

Open with GitHub

Open with Vertex AI Workbench

Reinforcement Learning from Human Feedback (RLHF) can be used to further

RAG works by:

1. Creating an initial prompt from the user’s query or statement.

3. Sending the augmented prompt to the LLM.

Note: Factual Grounding vs RAG

Factual grounding is the process of ensuring that an LLM’s generated text is

CoT it or ToT it. GoT it?

use a prompt to derive the Chain/Tree/Graph of Thought output as a set of steps.

The Tree of Thoughts (ToT) framework is a new approach to AI reasoning. It’s

In a ToT diagram, each node is a “thought”. A thought is a coherent chunk of text

Tree of Thoughts allows multiple step analysis, multiple comparisons, and

GoT models each thought generated by an LLM as a node within a graph.

prompting the LLM to solve problems through networked, non-linear reasoning.

This strategy can be considered to be a generalization of Chain of Thoughts and

Figure 2: Contrasting the Various X of Thought Prompt engineering paradigms

Let’s explain this table.

The Tree of Thought model is characterized by its ability to represent complex

ReAct — Plan it.

Topics and References

[Textual] In-Context Learning (TICL). Okay, I made the [textual] word up to

Retrieval Augmented Generation (RAG)

FLARE, a form of Active RAG.

Chain of Thought (CoT)

Tree of Thoughts (ToT)

Graph of Thought (GoT)

ReAct: Reasoning & Action.

Learning to Perform Complex Tasks through Compositional Fine-Tuning of

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason

[2] (Textual) In-Context Learning (TICL)

[3] Few-shot learning

Authors propose a model-agnostic algorithm for meta-learning: it is compatible

The goal of meta-learning is to train a model on a variety of tasks so that it can

Finn, Chelsea, et al. “Model-agnostic meta-learning for fast adaptation of deep

[4] Supervised learning.

Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The elements of

[5] Retrieval Augmented Generation (RAG)

Lewis, Patrick, et al. “Retrieval-augmented generation for text summarization.”