Download as pdf or txt
Download as pdf or txt
You are on page 1of 93

The Definitive Guide to

Generative AI for Industry

How Digital Mavericks are Redefining


the Rules of Digital Transformation
“The seismic new AI frontier offers an “Safe, secure, hallucination-free
unprecedented opportunity to harness generative AI is critical to paving the road
technology to transform industry at a to sustainable and profitable global energy
scale that impacts national GDPs–but, to supply and manufacturing excellence.
be successful, AI needs context. AI needs This is the moment Cognite was built for.
a strong data foundation. Now is the time A culmination of our extensive Industrial
to place bold bets in digitalization and AI experience, The Definitive Guide to
usher in tremendous simplicity, scale, and Generative AI for Industry provides a
a clear path to autonomous operations. comprehensive how-to for transformation
The Definitive Guide to Generative AI for leaders looking to drive actual bottom-
Industry is a step-by-step guide for taking line impact.”
advantage of this enormous opportunity
within your organization.”

MOE TANABIAN GIRISH RISHI


CHIEF PRODUCT OFFICER CEO AND CHAIRMAN
COGNITE COGNITE

“Generative AI is fundamentally reshaping “Cognite’s work so far with LLMs, grounded


operational processes and it will be by its Industrial DataOps platform, is
the digital mavericks who will be early remarkable–with even greater value
adopters of this technology. Celanese on the horizon in the form of not just
has set the standard in industrial data management, but information
innovation with its Digital Plant of the orchestrated by real, thoughtful AI”
Future; and by working with Cognite to
accelerate AI implementation into our
digital transformation strategy, it will allow

The Definitive Guide


us to raise the stakes once again and
solidify Celanese as a leader in advanced
manufacturing.”
to Generative AI for Industry
How Digital Mavericks are Redefining BRENDA STOUT
VICE PRESIDENT MANUFACTURING
JOE LAMMING
INDUSTRY ANALYST
the Rules of Digital Transformation CELANESE VERDANTIX

©Copyright, Cognite, 2023 – www.cognite.ai →


Introduction:
The Definitive
The
Guide
Time to
forGenerative
Industrial DataOps
AI for Industry
is Now

Contents

06 Foreword
38 Chapter 3
78 Chapter 7
122 Chapter 11

Digital Mavericks: Defining Industrial Contextualization, Impact of Gen AI on Asset


The Individuals Leading Knowledge Graphs Data Models, and Digital Twins Performance Management
the Industrial Transformation

18 Executive Summary
44 Chapter 4
92 Chapter 8
136 Chapter 12

The Four Things You Need Making Generative AI AI Is the Driving Force The Next Phase
to Know about Generative AI Work for Industry for Industrial Transformation of Digital Industrial Platforms
for Industry

24 Chapter 1
58 Chapter 5
104 Chapter 9
154 Chapter 13

Short History of AI The Industrial Data Use Cases for Industrial Tools for the Digital Maverick
in Industry and AI Problem Generative AI

34 Chapter 2
68 Chapter 6
116 Chapter 10

Demystifying LLMs A Robust Impact of Gen AI on Robotics


Contextualization Engine and Autonomous Industry
Foreword

Digital Mavericks:
The Individuals Leading
the Industrial Transformation
6
Foreword: Digital Mavericks: The Individuals Leading the Industrial Transformation

The Rise
“No one ever got fired of the ‘Digital
for choosing IBM.” Maverick’
Once considered a safe bet for capital projects, There is no more apparent opportunity to Characterized by strategic, big-picture thinking,
purchasing services from IBM offered large implement this strategy than by applying emerging relentless determination, strong business
enterprises a certain guarantee of results innovations in generative AI technology such as understanding, technical prowess, and sometimes
and peace of mind for complex, multi-year ChatGPT and GPT-4. Generative AI possesses the a chip on their shoulder, the digital maverick is a
initiatives. This thinking was acceptable when potential to fundamentally reshape how knowledge new breed of industrial leader who is critical for
business reinvention was measured in years and workers and subject matter experts interact with taking an industrial digital transformation program
decades. However, in today’s new era of hyper- data and operational processes. In fact, Gartner and making it truly meaningful and valuable with
digitalization and generative artificial intelligence estimates that “by 2030, 75% of operational the advent of industrial AI.
(AI), maintaining this long-standing business cliché decisions will be made within an AI-enabled
carries increasing, compounding risk. Buying ‘safe’ application or process.” 2 Connecting the dots While other members of digital operations teams
often comes at the cost of speedy innovation between the business and the technology, think about deploying operational use cases and
and long-term differentiation, where asset-heavy creating the opportunity, and driving the required analytics, the digital maverick is thinking about
industry has much to learn and where a new class change management takes a clear vision and a building new long-term business capabilities.
of business leaders recalibrates the risk vs. reward strong immunity and resilience to failure. Instead of searching for keywords such as: “what
of these strategic technology investments. is an industrial chatbot?,” “good data management
Industrial organizations can no longer afford not to platforms,” or “digital operations dashboards,” the
As Gartner puts it in their Maverick predictions take risks. But which heroes are central to business digital maverick craves and drives clarity on new
series: “Best practices don’t last forever—you transformation and are staking their careers (and ideas such as “ChatGPT for operations,” “retrieval
need to look toward next practices, too. Challenge driving meaningful change) based on the truth that augmented generation,” “AI and contextualized
your thinking by considering less-obvious digitalization across industry remains deficient data,” and “autonomous digital factory.”
developments.” 1 If leading industrial companies and that a smarter, more autonomous industrial
prioritize their long-standing business playbooks future requires a new playbook? They also know from experience what has not
in this new era of AI, they risk missing their ‘iPhone’ worked well enough to deliver meaningful value
moment, where making certain bets in digitization from previous attempts at industrial AI—either from
(at the right time) pay off in an outsized way, seeing first-hand more noise than value (unreliable
ushering in tremendous simplicity, scale, and a models, too many false positives, constant
clear, valuable path to autonomous operations. retraining), or outright failure of AI deployments to
create meaningful change in operator workflows.
The digital maverick knows there’s a solution to
the age-old industrial challenge of operating
complex environments with limited insights and
sub-optimal decision-making.
1. www.gartner.com/en/podcasts/thinkcast/behind-the-research-maverick-research →

2. Gartner Report: Maverick Research: Data and Analytics Roles Will No Longer Be a Priority →

8 9
Foreword: Digital Mavericks: The Individuals Leading the Industrial Transformation

Q&A with
a Digital Maverick:
Q: You’ve been working in the area Q: What key challenges have you

Ibrahim Al-Syed, Director


of digital transformation for most experienced when adopting
of your career; can you tell us a bit new operational technology?
about your journey?

of Digital Manufacturing Ibrahim: For me, it was a very natural transition.


Earlier in my career, I worked at an integrated
Ibrahim: While working both in Singapore and the
U.S., I’ve seen the same pain repeatedly. There’s so
much siloed technology out there, with individual
refinery and chemical plant. I was involved in a solutions for almost every operational problem …
program to improve operational processes. This AI, digital twins, too many UIs. Something needs
meant we were looking at design issues, hardware to be fixed if it is still easier to pass paper around
issues, and the operating model to improve our than trying to go from tool to tool to find data and
asset integrity program. What became very prepare it to fit in another tool or work process.
apparent while working in that role was that we We have to move away from situations where a
needed to have easier access to more data in order single event at a plant generates activity that
To truly understand this digital to quickly get insights for smarter troubleshooting has limited actual value on the asset.
maverick mindset and perspective, and reliability solutions.
Here’s where my idea of human-centered digital
we spoke with one of these I still remember the conversation I was having transformation comes into play. How can we better
innovative thinkers, Ibrahim in my boss’s office where we drew this concept architect the interactions between solutions and
Al-Syed, the Director of Digital of a ‘digital twin’ on a whiteboard. I said, “What if technology with the data and processes we need
we can have assets at the center and, in just one to empower our people? It’s how you transform
Manufacturing at Celanese
click, know everything about that asset, what has their ways of working so that they are optimized,
Corporation, a global chemical happened in the past, what is happening now, inspired, and work at their maximum potential. So
and specialty materials company. and what could happen in the future?” you have to design backwards from the people,
starting with a clear vision of where you want to
Then we started looking at what technologies we go and what it’s worth.
would use to reach that goal. That was the first
time we started looking at a concept that would
leverage data as a step change in performance
for integrity and reliability.

10 11
Foreword: Digital Mavericks: The Individuals Leading the Industrial Transformation

Q: When you look at your progress so far, Q: Looking ahead, where do you see
what approach are you taking to help generative AI for industry fitting
you get to this human-centered future? with the need for more intuitive operator
experiences around data and analytics?
Ibrahim: We want to reach a point at which we What use case/business capability
operate facilities by exception. But that level of is most compelling?
maturity requires a new level of human-centered
use cases and an advanced way of using and Ibrahim: Generative AI is a game changer for our
propagating data. industry. We are finally to the point where we can
get ‘steel and assets to talk to us,’ which can help
So when you build any new product, you do market companies run and operate facilities by exception.
research and understand customer feedback to You can have insights and generated actions
create something people want to use. You have before you even have time to click a button. It is
to do the same—what I call user research—in important to note that highly experienced workers
digital transformation to give these new business are retiring, and their experience and knowledge
capabilities the best chance to get adopted at can often be lost. These are the workers who
the plant level. can hear when a machine bearing needs grease,
or feel when a machine is vibrating excessively
When our people are involved in the design and not running correctly. With talent shortages
process, their experiences allow us to model the and retiring employees, generative AI promises
data to propagate it seamlessly. What I learned to maximize overall equipment effectiveness in
about change management is that people will production environments.
accept some of the ‘broken stuff’ as long as they
own, lead, and drive the journey.

12
Foreword: Digital Mavericks: The Individuals Leading the Industrial Transformation

Digital Mavericks:
This is Your Moment
1. Realign digital KPIs and skill sets
to key business drivers

Digital mavericks know that their organization’s Additionally, the portfolio of skills needed to
charter and KPIs must be ever more tied to business develop and run analytics departments is also
value and operational gains. Instead of deploying positioned to change. Gartner estimates that
proofs of concepts, their KPIs must reflect “by 2030, the number of traditional descriptive
Everything these digital mavericks have been business impact, successful scaling, and other analytics dashboards will decrease by more
implementing with incremental gains to date product-like metrics: than 50% in most modern digital businesses.”3
(Industrial DataOps infrastructure, skillset Generative AI will have successfully abstracted
investments, and other foundational capabilities) ■ How can business value be measured, away the traditional complexity of creating and
have become 10X more valuable almost quantified, and directly attributed to the digital managing specialized analytics in favor of more
overnight as industry adopts high-trust, secure, initiative? What ‘portfolio’ or ‘menu’ of value business-ready insights.
hallucination-free generative AI to harvest more is being developed?
business value from digital operations. The clear message to legacy CIOs and want-to-
■ How is adoption measured and user feedback be mavericks? Stop investing in IT and data and
The inflection point could not be more real, implemented in the feedback loop? How many analytics skills and focus more on connecting
obvious, or urgent from both a technology and daily active operations users are actually in the AI-powered use cases to business impact.
strategic perspective. True digital mavericks know tools delivered? How is the workflow changing
that the game has been changed forever and as a result?
that they carry a new responsibility to drive and
govern a new playbook that includes a new set ■ How much do these business applications
of guidance: and solutions cost to deploy and maintain?
What do they cost to scale to the following
asset, site, or plant?

■ What new business capabilities for multiple


stakeholders are gained due to deploying a
solution? Are these capabilities short term,
or will they persist (and drive value) over the
long term?

3. Gartner Report: Maverick Research: Data and Analytics Roles Will No Longer Be a Priority →

14 15
Foreword: Digital Mavericks: The Individuals Leading the Industrial Transformation

2. Land and expand with business value 3. Challenge traditional thinking ■ They know that SaaS is also a valid enterprise-
faster than the competition around DIY (do-it-yourself) projects grade path and appreciate that mass-market
and technology software spreads development risk and comes
Technology can change in an instant, but with self-service documentation, support SLAs,
operational KPIs are evergreen. Tackling digital Digital mavericks have always been tempted to and other entitlements that minimize costs.
transformation holistically across the enterprise go down the DIY path. Still, more have started to
has yet to prove to move the needle fast enough to realize the significant opportunity costs that come ■ They have seen that job security and
deliver a meaningful competitive edge in dynamic with this approach’s strategy, implementation, and department prestige is a product of delivering
markets. Instead, digital mavericks are pursuing change management—especially when it comes actual value in operational workflows, not
‘land and expand’ approaches that integrate new to time. Is your largest competitor looking for being seen as a high-headcount Proof of
technology in the context of high-value use cases, off-the-shelf generative AI components for their Concept (PoC) factory with little quantifiable
with clear pathways from initial test concepts digital transformation while you invest resources value to show.
into scaled (and 10x more valuable) deployments. in more cloud-based building blocks?
■ They acknowledge that their industrial
Whether the team is focused on driving down Instead of making a name based on ‘completeness company will never be in the software business
OPEX, increasing production capacity from and sophistication of tech stack,’ consider building and that, to be true innovators, they can’t
shorter turnarounds (TARs), or eliminating waste a reputation based on ‘time to scaled value,’ a far repeat pre-cloud, pre-SaaS-era bespoke
from a production process, think beyond siloed more critical metric. Here are a few other long- software development practices.
processes. Can this new use case be deployed standing myths that digital mavericks are starting
across sites? How valuable is it compared to other to challenge:
business requests? What is the cost and effort
to scale? ■ They recognize that DIY is not usually the least
expensive option when considering headcount
Here, the digital maverick puts a premium on and long-term total cost of ownership of
flexible technology, approaches, and teams so infrastructure and deployed solutions.
that they can be prepared to shift strategy—
either to take advantage of market opportunities ■ They are wary of going all-in on one cloud
or new technology such as generative AI—at a provider’s capabilities, knowing that the only
moment’s notice. For example, GPT appeared way to avoid vendor lock-in is by investing in
in a decent state of maturity within just a few multi-cloud ecosystems that include DIY and
quarterly business cycles, disrupting product SaaS.
roadmaps and, just like the pre-internet days,
sending the world to a new level that will never ■ They understand that differentiation in
come back. Organizations equipped to mobilize technical and go-to-market abilities is a
and execute on these trends can capitalize on function of speed and agility, not having a
market opportunities much faster than their potentially cumbersome, custom home-grown
competitors. data platform.

16 17
Introduction: The Time for Industrial DataOps is Now

Executive Summary

The Four Things


You Need to Know
about Generative AI
for Industry
18
Executive Summary: The Four Things You Need to Know about Generative AI for Industry

You’re Receiving Cognite addresses

1. 2. 3. 4.

AI Messages
Data leakage Trust & Access Hallucinations Real-time data
Control
Keep industrial data Control what data A deterministic industrial Real-time data beyond

from Everyone.
proprietary and resident is accessible through knowledge graph training cycles of LLMs
within the security graph and API to minimize hallucinations from the graph store
of your corporate tenant

Industrial Canvas

Generic-Purpose LLMs without industrial context


There is a lot of information out there (and, yes, 1. LLMs + knowledge graph
we realize we’ve added a 184-page book to the = Trusted, explainable Data sources
onslaught). It can be a lot to take in. To help, we’ve
boiled down generative AI for Industry to four
generative AI for industry
Retrieval Augmented Industrial
key points. We will dive into more details on each Generation (RAG) Knowledge Graph
of these points in the following chapters, but if This is the simple formula to apply generative AI ML Assistant for Industrial Context Built using Cognite Data Fusion®

you only read one part of this book, let it be this: for industry. Your asset performance management
is made intelligent and efficient by combining
large language models (LLMs) with a deterministic
industrial knowledge graph containing your
operations data.

2. Generative AI for industry


needs to be safe, secure,
and hallucination-free

And with the previous formula, it is. You need


a complete, trustworthy digital representation
of your industrial reality (industrial knowledge
graph) for LLMs to understand your operations,
and provide deterministic responses to even the
Large Language Models Industrial Knowledge Graph most complex questions.

20 21
Executive Summary: The Four Things You Need to Know about Generative AI for Industry

Contextualize events to assets Contextualized documentation Contextualized 3D models

E.g. connect shop orders, work orders, E.g. connect tags in P&IDs to associated E.g. filter for and visualize where work
and alarms, to correct site, prod. line assets, time series, work orders, orders are located on plant
and equipment and documents etc.

3. To apply generative AI This means having a deterministic industrial


in industrial environments, knowledge graph of your operations, including
real-time data. You need a solution that delivers
the ability to prompt LLMs contextualized data-as-a-service with data
with your operational contextualization pipelines designed for fast,
context is everything continuous knowledge graph population.

4. While generative AI itself Generative AI can already be applied today across


is undeniably transformative, field productivity, maintenance planning, and
robotic automation, but only with a platform that
its business value is in its delivers essential AI features that enable simple
application to the real-world access to complex industrial data for engineers,
needs of field engineers subject matter experts, data scientists, and more.

22 23
Introduction: The Time for Industrial DataOps is Now

Chapter 1

Short History
of AI in Industry
24
Chapter 1: Short History of AI in Industry

Introduction
As we delve into the state For most of industrial history, technological
of industrial AI, we must progress primarily affected the physical aspects
of business (better machines, more efficient
acknowledge that AI’s impact radius factories). Now, we are witnessing remarkable
has grown substantially, expanding advancements in machine learning techniques
its influence beyond individual and computing power. These advancements have
enabled AI to transcend its previous limitations,
machines to encompass entire
propelling it into organizational decision-making AI-driven insights are quickly becoming the
production systems, reshaping in a way not thought possible even a year ago. foundation for business decisions that continually
the way industries operate. Today, AI is not merely an add-on or a supplement improve operational efficiencies, drive innovation,
to industrial processes; it is quickly becoming enhance customer experiences, and identify new
an integral part of modern industrial operations. business opportunities.

From sophisticated robotics to intelligent sensors, Integrating AI into machines, production systems,
machines are now equipped with AI-powered and organizations has brought about a paradigm
capabilities that enable them to adapt, learn, and shift in how industry operates. AI is soon to be
optimize their performance. AI algorithms can the backbone of the modern industrial landscape.
now analyze vast amounts of data collected from In the following sections, we will explore the
various sources to optimize operations holistically, latest advancements, trends, and case studies
identifying patterns, predicting failures, and that highlight the profound implications of AI in
making intelligent decisions. reshaping industry and propelling it toward a
smarter, more sustainable future.
AI is optimizing production schedules, minimizing
waste, streamlining resource allocation, and
providing a better understanding of market
dynamics and customer behavior. And AI-enabled
predictive modeling and simulation techniques
enable organizations to mitigate risks and
maximize returns on investment.

26 27
Chapter 1: Short History of AI in Industry

Predictive Maintenance: introduction of Deep Learning (vintage 2012) is Modern AI, when paired with a consolidated and
Two Decades in the Making what helped to overcome these initial limitations contextualized data foundation, can overcome
because the transformer model that powers Deep many of the challenges predictive maintenance
and the Role of AI Learning neural networks made it possible to has faced in the past. AI-powered algorithms can
use AI to identify and determine which data is now analyze complex data sets, including sensor
Predictive maintenance, the practice of utilizing most valuable in predicting future outcomes. In readings, environmental factors, operational
data-driven analytics to predict and prevent contrast, early machine learning relied upon the parameters, and even unstructured data such as
equipment failures before they occur, has been data scientist to identify and determine the best maintenance logs or operator feedback. AI systems
a goal pursued by industry since the early 2000s. data features with predictive value. This ability can identify early warning signs of equipment
Its potential to optimize maintenance operations, to automatically learn from data and adapt to degradation or failure by employing advanced
reduce downtime, and improve overall equipment changing conditions has significantly enhanced techniques such as anomaly detection, pattern
effectiveness has long been recognized. the accuracy and effectiveness of predictive recognition, and predictive modeling.
maintenance systems.
Initial machine learning methods, which relied on The use of AI in predictive maintenance has
statistical analysis and rule-based approaches However, as discussed in detail in a review also paved the way for new approaches, such
to detect anomalies and trigger maintenance published in the CIRP Journal of Manufacturing as prescriptive maintenance. By combining
actions, were often limited by their reliance on Science and Technology 4, a lingering challenge predictive analytics with optimization techniques,
pre-defined thresholds and a lack of flexibility of predictive maintenance is the availability and prescriptive maintenance systems can recommend
to adapt to dynamic operational conditions. The accessibility of data. Traditional maintenance the most efficient maintenance actions,
practices often relied on scheduled maintenance considering factors such as equipment criticality,
routines or reactive approaches, where resource availability, and cost considerations.
maintenance was performed after a failure AI enables organizations to predict and prevent
occurred. As a result, there is a lack of historical failures and optimize maintenance strategies for
data on equipment performance and failure improved overall operational efficiency.
patterns.
Integrating AI and machine learning has played
To solve the challenge of data availability, we saw a pivotal role in unlocking the true potential of
the rise of industrial IoT and sensor technologies predictive maintenance, enabling industries to
that enabled real-time data collection from transition from reactive practices to proactive,
machines, making it possible to monitor their data-driven approaches. As AI continues to
condition and performance continuously. advance and industries embrace its potential, the
Unfortunately, this feeds into the accessibility future of predictive maintenance holds even more
of data problem: too much information. promise, with the ability to transform maintenance
operations, enhance asset performance, and
drive the growth of intelligent, resilient industrial
ecosystems.

4. www.sciencedirect.com/science/article/pii/S1755581722001742 →

28 29
Chapter 1: Short History of AI in Industry

Hybrid AI: The Convergence Moreover, hybrid AI enables continuous


of Human Expertise learning and improvement. Human feedback
and interventions can be integrated into AI
and Machine Intelligence models, enhancing accuracy, adaptability,
and generalizability. Human experts can refine
The significance of hybrid AI in industry lies AI algorithms by providing domain-specific
in its ability to bridge the gap between data- knowledge, validating results, and correcting
driven insights and human understanding. It biases or inaccuracies. This iterative collaboration
acknowledges that while machines excel at between humans and machines strengthens the
processing large volumes of data, identifying AI models over time, resulting in more robust and
patterns, and performing repetitive tasks effective solutions.
with precision, they still need human experts’
contextual understanding, intuition, and creativity. Hybrid AI is best suited for complex industrial
Combining AI algorithms’ analytical power process problems where a mathematical theory
with human domain knowledge and decision- framework exists that can be used to teach a
making skills, hybrid AI aims to create a symbiotic machine learning model that is then used on
relationship that harnesses the strengths of both real-time data for predictions. The result is a
humans and machines. high-confidence, tailored hybrid model combining
strong domain knowledge (physics) with machine
One key aspect of hybrid AI is its role in addressing learning for cost efficiency and scalability.
the ‘explainability’ of AI systems. Traditional
black-box deep learning models often lack By empowering humans with AI-driven insights
transparency, making it challenging for humans and automation, hybrid AI enables industry
to understand and trust their outputs. This more professionals to make more informed decisions,
informed reality of AI in industry is driving the identify hidden patterns or anomalies, and optimize
future of hybrid machine learning, a blend of complex processes. All of which will enhance
physics and AI analytics that combines the ‘glass decision-making and operational efficiency.
box’ interpretability and robust mathematical
foundation of physics-based modeling with AI’s
scalability and pattern recognition capabilities.
By combining physics and AI analytics, hybrid
AI systems provide a more comprehensive
and understandable rationale behind their
recommendations, increasing trust and facilitating
human acceptance of AI-driven decisions.

30
Chapter 1: Short History of AI in Industry

Generative AI: With the advent of natural language processing But how do we make
(NLP) techniques and conversational AI, human
The Accelerator of Everything
operators can interact with generative AI models
generative AI
using everyday language, making the technology work for highly
Generative AI is a class of AI techniques that can more accessible to a broader range of users.
generate new data, content, or solutions based
risk-averse industries?
on patterns and insights derived from existing Through copilots, operators can easily
data. Unlike traditional AI models that rely on communicate their requirements, constraints,
pre-defined rules or training on large labeled and preferences to the generative AI models.
The next chapter dives
data sets, generative AI models can learn from For example, an operator can specify the into how to enable
context-enriched data without explicit guidance, desired turnaround duration, prioritize specific
enabling them to create novel outputs that mimic maintenance tasks, or account for particular
a hallucination-free
the characteristics of the training data. safety protocols. Generative AI models can then AI experience with your
process this information and generate turnaround
Generative AI models can ingest diverse data plans that align with the operator’s objectives
enterprise-wide data.
sets, including historical maintenance records, while optimizing various factors for efficiency
sensor data, work orders, and even unstructured and cost-effectiveness.
data such as maintenance reports or equipment
manuals. If this data can be augmented to Copilots seamlessly integrate with human
include semantic, meaningful relationships (i.e. operators’ workflows, providing real-time
context), generative AI can recognize patterns suggestions, explanations, and feedback during
and correlations to create detailed and optimized any industrial process. The operator fine-tunes
turnaround plans, for example, considering factors the outputs based on domain expertise and the
such as resource availability, dependencies AI empowers operators to make faster, more
between tasks, budget constraints, and safety informed decisions. This collaborative interaction
considerations. between human operators and generative AI
models enhances the quality and accuracy of
However, simply ingesting the diverse data sets the AI’s results.
without context will prevent generative AI from
being deterministic (more of which will be covered By automating and optimizing complex processes
in coming chapters). traditionally reliant on manual efforts, generative
AI accelerates all industrial digitalization efforts we
Generative AI has emerged as a powerful catalyst, have embarked on up to this point and spurs new
propelling industry towards new frontiers of frontiers for digitalization we have yet to imagine.
efficiency, innovation, and operational excellence. Generative AI stands to be the driving force behind
Key to the wildfire that is generative AI is the industrial digitalization, revolutionizing operations
accessibility of data it provides through human and processes across various sectors.
language interfaces and AI copilots.

32 33
Introduction: The Time for Industrial DataOps is Now

Chapter 2

Demystifying LLMs
34
Chapter 2: Demystifying LLMs

Understanding Large LLMs are trained on vast amounts of textual data


from diverse sources, such as books, articles,
and the internet, allowing them to develop an
For example, an LLM-based system can analyze
maintenance reports, sensor logs, and operator
notes to help operators efficiently navigate and

Language Models
understanding of grammar, context, and semantic discover relevant data, leading to better decision-
relationships. These models can then generate making and improved operational efficiency.
coherent and contextually appropriate text based
on user prompts or queries. LLMs can also play a vital role in industrial data

and their Application


analysis by assisting in critical activities such
The history of LLMs can be traced back to the as anomaly detection and quality control. By
development of neural networks and deep learning ingesting historical data, sensor readings, and
algorithms. Early models, such as recurrent neural operational parameters, LLMs can learn to identify

in Operations
networks (RNNs) and long short-term memory early signs of equipment failure, detect deviations
(LSTM) networks, paved the way for language from normal operating conditions, or pinpoint
understanding and text generation tasks. However, potential quality issues, supporting proactive
it was the advent of transformer models, with maintenance strategies.
the introduction of architectures such as the
GPT (Generative Pre-trained Transformer) series, LLMs are a powerful tool for industry, improving
that brought about significant breakthroughs in operations in various ways that minimize
LLM technology. downtime, reduce costs, and achieve higher overall
Large Language Models, also known Whereas blockchain often requires specialized efficiencies. With their ease of use, adaptability,
as LLMs, are perhaps the biggest knowledge, technical expertise, and substantial With GPT models, the pre-training phase involves and practical applications, LLMs offer a user-
investments in infrastructure, LLMs provide a much exposing the model to a large corpus of text, friendly solution that can streamline operations,
buzzword since blockchain. more accessible and straightforward solution. allowing it to learn language patterns and build a automate tasks, gain valuable insights, and drive
They are easily integrated into existing workflows contextual understanding of words and phrases. innovation in their respective industries.
Unlike its complex and systems, enabling seamless adoption and The fine-tuning phase then refines the model’s
integration with business operations. capabilities for specific tasks or domains, making
and challenging-to-implement
it adaptable and versatile for various applications.
predecessor, business At its core, an LLM is an AI model that utilizes deep
professionals can leverage LLMs learning techniques to understand, interpret, and The practical applications of LLMs in business
without extensive prerequisites. generate human language, enabling businesses operations are vast. LLMs can assist in automating
to leverage their capabilities for a wide range of repetitive tasks, generating personalized content,
applications. and analyzing customer feedback. Additionally—
and of most interest to the audience of this
book—LLMs can process vast amounts of text
documents, extract relevant information, and
summarize key findings, helping to extract insights
from large volumes of unstructured data. This
capability can assist in research and development,
data analysis, and decision-making processes,
enabling businesses to derive insights from diverse
sources of information more effectively.

36 37
Introduction: The Time for Industrial DataOps is Now

Chapter 3

Defining Industrial
Knowledge Graphs
38
Chapter 3: Defining Industrial Knowledge Graphs

Extract Data Knowledge graphs use machine learning to


construct a holistic representation of nodes,
edges, and labels through a process known as
It liberates the data that’s been locked in different
systems and applications (high-frequency
time-series sensor data, knowledge hidden in

Relationships,
semantic enrichment. By applying this process documents, visual data streams, and even 3D
during data ingestion, knowledge graphs can and engineering data) and makes it meaningful
discern individual objects and comprehend the and manageable.
relationships between them. This accumulated

Capture
knowledge is then compared and fused with other An industrial knowledge graph turns raw data into
data sets that share relevance and similarity. valuable operational insights. With a knowledge
graph, you can:
A knowledge graph enables question answering

Interconnections,
and search systems to retrieve comprehensive Go from search to discovery: When there
responses to specific queries. Knowledge graphs are hundreds of data sources and countless
are powerful time-saving tools, streamlining name conventions, searching can be tedious.
manual data collection and integration efforts Discovering related data instantaneously will

and Trace Data


to bolster decision-making processes. help you leap from antagonizing searches to
discovering insights.
An industrial knowledge graph is an open, flexible,
and labeled property graph data model that Understand relationships: And not just

Lifecycles
represents your operations. tables. The relationships often matter more
to the answers we seek than the data entries
they connect. Understanding our processes
and systems requires understanding our
data real estate in its context.

Knowledge graphs are constructed Schemas establish the fundamental framework


by combining data sets from upon which the knowledge graph is built, while
identities efficiently categorize the underlying
diverse sources, each varying nodes. Context, on the other hand, plays a pivotal
in structure. The harmony between role in determining the specific setting in which
schemas, identities, and context each piece of knowledge thrives within the graph.
contributes to the coherence of
this comprehensive data repository.

40 41
Chapter 3: Defining Industrial Knowledge Graphs

Cognite Data Fusion’s


Flexible Data Modeling

Execute industrial artificial intelligence


(AI) initiatives: Knowledge graphs help
operationalize your data science, artificial
intelligence, machine learning (AI/ML), and
Internet of Things (IoT) initiatives.

However, a knowledge graph is only as powerful


as the data it can access. To be effective in
complex industrial settings, a knowledge graph
must include:

■ Automated population with contextualization, Cognite Data Fusion’s


cross-source-system IT, OT, and ET data Industrial Knowledge Graph

■ A robust, well-documented API integration In this example, the left diagram above illustrates This simple example illustrates the importance
a simplified version of an industrial knowledge of data contextualization across different
■ Extremely high performant, real-time, flexible graph of a centrifugal pump. Depending on the systems. With generative AI, Cognite’s data
data modeling to enable a variety of uses, persona, users may explore a problem with the contextualization capabilities power the industrial
including queries and search, natural language pump from multiple entry points. Maintenance may knowledge graph (as seen on the right side) to
processing, machine learning algorithms, and start with the latest maintenance report, while an provide access to the maintenance report, work
visualization operator may use the time series, and a remote order, time series, and more in a single location.
SME may begin with the engineering diagram
(e.g., P&ID). The maintenance report, the work With the industrial knowledge graph as the
Like the principle of compounding interest, data order, the time series values, and the engineering foundation, data is understood and structured
in the industrial knowledge graph becomes diagrams are each in separate systems. Having all to meet the specific needs of users or use cases.
increasingly more valuable as people use, leverage, this data connected in the industrial knowledge The topic of industrial knowledge graphs, data
and enrich that data. graph creates a seamless experience, regardless models, and digital twins is discussed in more
of the starting point. detail in chapter seven.
More useful and high-quality data leads to more
trusted insights. More trusted insights lead to
higher levels of adoption by subject matter
experts, operations and maintenance, and data
science teams.

A user-friendly, AI-powered experience ensures


adoption and use will grow, and this cycle repeats
exponentially.

42 43
Introduction: The Time for Industrial DataOps is Now

Chapter 4

Making Generative
AI Work for Industry
44
Chapter 4: Making Generative AI Work for Industry

Liberating Data has no value


unless the business

Industrial Data
trusts and uses it.

Data contextualization also raises the question


of trust. Industrial enterprises must trust the
data they put into solutions, from dashboards to Why Historians are Not Enough
digital twins to generative AI-powered solutions.
Ultimately, data has value only if the business The source for most operational data is a data
To truly solve the industrial data Data users still spend up to 80% of their time trusts and uses it. historian, which holds the data deemed valuable
and AI problem, data must searching, gathering, and cleaning data, costing long term. But there is more to a plant than what
businesses millions of dollars in working hours Organizations can increase trust and avoid is available in the historian. Because operators
be liberated from siloed every year. This bottleneck toward productivity other disadvantages of data lakes by adopting instrument the plant for safe operation and
source systems. will only become worse in legacy architectures an Industrial DataOps mindset. This approach control, most of the data required for efficient
as IDC predicts data generation in asset-heavy makes data more valuable to a growing audience maintenance must be picked up elsewhere, i.e.,
organizations to increase by 3X in the next two of data consumers, both inside the enterprise via inspections, maintenance campaigns, industry
to three years. and across its partner ecosystem. reports, vendor data, and more.

Industrial organizations must first liberate all data DataOps is a collaborative data Additionally, what is available in the historian is
across numerous siloed data sources, and then management practice focused limited by the data types supported. Modern
get the right data to the right subject matter on improving the communication, instruments and protocols allow for more data
experts (SMEs), with the right context, and at the integration, and automation of data types than numbers, mainly ignored or stored
right time. flows between data managers and data in separate, often local, databases.
consumers across an organization.
A modern pressure transmitter can provide
Industrial DataOps is about breaking down functions, such as self-calibration, and
silos and optimizing the broad availability housekeeping data, such as circuit board
and usability of industrial data generated temperature, in addition to a pressure reading.
in asset-heavy industries including oil If not planned for, these features go unused.
and gas and manufacturing.
Also not in the historian are fast-sampled data
SMEs, for example, must be empowered to stored as signatures, such as vibration or
access and harness data effectively. With simple operational data, which are handled in specialized
access to complex industrial data, industrial systems when the control system’s input/output
organizations can bring together formerly does not support it.
isolated SMEs, departments, platforms, and
data deployed by OT and IT teams to improve
operational performance through unified goals
and KPIs across the enterprise.

46 47
Chapter 4: Making Generative AI Work for Industry

Data layers: simple data flow

Data layers: complex data flow for one type of data

Only a fraction Consumers are a mix The historian End users access
of the instrumentation of control systems, is associated to data is limited to
data is consumed vendor solutions with process data what is made available
or on prem databases from the control to the location
system. Limited where they are.
by data type support Most often what is
and critically short in the historian
term and long term

Too Much Data Left Behind ■ There is a lack of knowledge or communication


on how the additional data can add value to An architecture and data flow seen above means Though control system vendors and the heavy
The problem with the current setup is that valuable other contexts (e.g., the process operator that a large amount of instrumentation data— machine industry’s requirement for safety still
data is left unused, either because it is filtered prioritizes data from a sensor that meets from housekeeping data to more complex data require gathering the data locally and local control,
away or because it ends up in a local data store. specific process requirements but ignores types—is left behind. control systems now depend more significantly
There are several reasons why this happens: built-in housekeeping data that can help to on IT-like architectures, using ethernet and
identify internal errors since it is not applicable Even when organizations design architectures virtualization principles while still providing
■ A new instrument with more capabilities for the primary use case). that pass this more detailed data along to the necessary safety and uptime requirements.
replaces one with fewer, but the additional control system, it can be stopped by the next The introduction of commoditized technology
data is not hooked up. ■ The operator designs instruments and control layer (i.e., the historian) since that solution vendor on this level enables control systems to handle
systems to last for many years. Because the also charges for by I/O Count and has limitations more data types and serve them more efficiently
■ Including more data can be costly, as control focus is on safe operation and control, the in what data types it supports. for local users. It also opens up to separate the
systems and historians have a cost tied to I/O people involved are rightfully conservative data pipelines for critical and non-critical data
or tag count; however, some vendors have and hesitant to change something that works Today, sensors can self-calibrate and provide to reduce cost (less I/O and interface) and gain
started to differentiate between types of data, for its primary purpose. internal data about their health. So a pressure velocity (less interfaces and vendors).
e.g., process critical vs. housekeeping. transmitter can report both the measured process
pressure, as well as calibration curves, error While today’s historians capture the critical
■ Different vendors provide sensors, control registered, and onboard circuit health data—all process data required for analysis, most do not
systems, and historians, so changes are costly through an ethernet interface and open protocol. extend their data types like engineering drawings
in terms of hardware and software and in terms (P&IDs, standard operating procedures, inspection
of management and coordination with various rounds), work orders, reliability data, and IoT
vendors. sensors.

48 49
Chapter 4: Making Generative AI Work for Industry

But What About Hallucinations, Security and privacy concerns also arise with RAG is All the Rage
Security, and Privacy? LLMs, which can be vulnerable to adversarial
attacks where malicious actors deliberately As discussed, LLMs have access to the corpus of
manipulate input prompts to generate misleading, text used during the model’s training. However, as
As powerful as LLMs are, there have been harmful outputs or exploit vulnerabilities in the input, these models can also take new information
instances where they produce inaccurate or LLM implementation or infrastructure to gain to incorporate when responding to a natural
misleading information. These hallucinations can unauthorized access to sensitive data or disrupt language prompt. This additional content can
be problematic, especially when LLMs are used operations. For example, compromising an LLM come as real-time access to web-based queries
in critical decision-making processes. used in a control system could lead to equipment of publicly accessible content or from the user in
malfunctions, production errors, or unauthorized the form of additional inputs as part of the prompt.
LLMs have a vast knowledge base to draw from. access to critical resources.
However, the content in an LLM’s data stores Retrieval Augmented Generation (RAG) is a design
may be dated and based solely on content Data leakage is a particular concern for industrial pattern we can use with LLMs to provide industrial
from the public domain. These factors limit organizations that often deal with sensitive data, data directly to the LLM as specific content to use
the source data for generating a response, including proprietary designs, trade secrets, or when formulating a response. A type of in-context
potentially leading to out-of-date info or ‘creative’ customer information. LLMs trained on such data learning, RAG lets us use off-the-shelf LLMs and
answers to compensate for the information gap could unintentionally leak confidential information control their behavior through conditioning on
(hallucinations). through the generated responses. private contextualized data. This approach
allows us to utilize the reasoning engine of LLMs
If we can ‘train’ an LLM like ChatGPT on curated, Moreover, there is a concern that fine-tuning to provide deterministic answers, based on the
contextualized industrial data, then we could LLMs on proprietary or confidential information specific inputs we provide, rather than relying on
talk to this data as easily as we converse with may inadvertently expose sensitive details if the the generative engine to create a probabilistic
ChatGPT and have confidence in the basis of model is not adequately anonymized or protected. response based on existing public information.
the response.
Ensuring proper anonymization and protecting Another thing to note here is the context window
Contextualized data includes explicit semantic sensitive data during LLM training, protecting limitations of LLMs. The context window is the
relationships within the data, which ensures that the LLM infrastructure, securing data transfers, range of tokens the model can consider when
the text consumed is relevant to the task at hand. and implementing robust authentication and responding to prompts. GPT models start with a
For example, when prompting an LLM to provide authorization mechanisms are essential to 2K window size (GPT-3) and go all the way to 32K
information about operating industrial assets, mitigate cybersecurity risks and the risk of (GPT-4). In this context, a token is a piece of text
the data provided to the LLM should include the information leakage. that could be as short as one character or as long
data and documents related to those assets and as one word. GPT-3’s context window can handle
the explicit and implicit semantic relationships Overall, the challenges of hallucinations, about 2048 tokens (nowhere near large enough
across different data types and sources. security vulnerabilities, and privacy risks to take input from an entire industrial database).
Aggregations of industrial data in large data associated with LLMs highlight the need
The resulting industrial knowledge graph also warehouses and lakes that have not been for rigorous evaluation of LLM-based By contextualizing proprietary industrial data to
processes data to improve quality through contextualized or pre-processed lack the semantic systems. With that in mind, let us look create an industrial knowledge graph, we can
normalization, scaling, and augmentation for relationships needed to ‘understand’ the data and at how Cognite addresses these concerns. convert that enriched content into embeddings
calculated or aggregated attributes. For generative the data quality necessary for LLMs to provide and store it in a private database, fine-tuned for
AI, the adage of ‘garbage in, garbage out’ applies. trustworthy, deterministic responses. embedding storage and search (vector database).

50 51
Chapter 4: Making Generative AI Work for Industry

This specialized database of embeddings now


becomes the internally searchable source of inputs
that we provide to the LLM along with our natural
language prompts.

Cognite addresses

1. 2. 3. 4.
Data leakage Trust & Access Hallucinations Real-time data
Control
Keep industrial data Control what data A deterministic industrial Real-time data beyond
proprietary and resident is accessible through knowledge graph training cycles of LLMs
within the security graph and API to minimize hallucinations from the graph store
of your corporate tenant
Codifying this context as an industrial knowledge Leveraging this pattern, we can keep industrial
graph is critical to leveraging your vast industrial data proprietary and resident within the security
data within the context window limitations and of your corporate tenant. We can maintain Industrial Canvas
enabling consistent, deterministic navigation of and leverage the access controls required to
these meaningful relationships. By utilizing the maintain large enterprises’ trust, security, and
Generic-Purpose LLMs without industrial context
open APIs of major LLMs, we can then leverage audit requirements. Most importantly, we can
this trusted source of industrial context to create get deterministic answers to natural language
and store embeddings in a way that becomes prompts by explicitly providing the inputs LLMs Data sources
searchable (semantically) and enables us (with should use to formulate a response.
Retrieval Augmented Industrial
minor prompt fine-tuning) to fully leverage the Generation (RAG) Knowledge Graph
reasoning engine of LLMs to give us actionable
ML Assistant for Industrial Context Built using Cognite Data Fusion®
insights.

52 53
Chapter 4: Making Generative AI Work for Industry

A Thought on Cost In LLM application development, choosing the


best each time will come at a very high cost to
The clever readers might wonder, “Won’t the those using it. One could argue that RAG is to LLMs
context windows supported by foundation models what data engineering is to analytics: carefully
quickly become bigger, making RAG obsolete?” selecting and preparing the correct data before
The short answer is no. sending it for analysis.

Back to the need for rigorous evaluation of your


The longer answer AI systems, look for tools with an easy-to-use
is cost. UI that enables experimenting and prototyping
data flows. The example on the next page is a
no-code UI utilizing vector store, embeddings,
In all algorithms, even linear scaling—the best and LLM while reading data from an industrial
theoretical outcome possible—is simply a deal knowledge graph.
breaker on cost for almost all use cases imaginable.
At current API rates, each query to GPT4 would Here we see that we can embed PDF documents
easily end up costing tens of dollars at a context from the industrial knowledge graph and, with
window still limited to below 1,000 pages (which context from these documents, allow the LLM AI Agent Frameworks ■ Fixed environments have a static set of rules
is hardly any enterprise’s data corpus). to perform a semantic search. that do not change.
RAG is just one component of an industrial
When thinking of these queries similar to Google Behind the scenes, this particular flow is a AI architecture. While RAG effectively solves ■ Dynamic environments are constantly
searches by your employees, it quickly becomes standard flow for Document Q&A. The difference hallucination and data-freshness problems, changing and require agents to adapt to new
apparent why context window optimization—and, is that the UI makes the underlying components AI Agent frameworks give AI applications new situations.
of course, hallucination mitigation—using RAG more visible and helps explain the flow visually. capabilities.
as a cost-reduction initiative is already as hot as Visual data flow preparation can extend to more ■ Multi-agent systems involve multiple agents
LLMs themselves. complex flows and help developers optimize AI Agents are designed to achieve specific goals working together to achieve a common
applications and reduce the overall, long-term and can perceive their environment and make goal, often requiring coordination and
The significant difference in the cost between costs of AI infrastructure management. decisions autonomously. Agents include chatbots, communication to achieve their objectives.
sending queries to GPT3.5 or GPT4 alone will force smart home devices and applications, and the
AI application developers to carefully optimize programmatic trading software used in finance. Agents are another centerpiece of industrial
cost (known) vs. value (known for dependence AI architecture. And existing frameworks such
on quality, unknown for ultimate business value). Agents are classified into different types based as LangChain have incorporated some agent
on their characteristics: concepts already that can be incorporated into
industrial solutions.
■ Reactive agents respond to immediate
environmental stimuli and take actions based
on those stimuli.

■ Proactive agents take initiative and plan to


achieve their goals.

54 55
Chapter 4: Making Generative AI Work for Industry

For example, Cognite has prototyped an AI However, it is important to note that copilot- and
copilot for reliability-centered maintenance AI agent-based approaches leverage the power
using LangChain technology to better equip of natural language to understand and write
operators and reliability engineers to check code based on published API documentation and
damaged equipment. The copilot incorporates examples. This is impossible with data lakes or
standards, documentation, and images to run data warehouses where, without a contextualized
high-fidelity engineering calculations through industrial knowledge graph, there are no API
a human-language interface. libraries that can be used as a reliable mechanism
to access rich industrial data. Additionally, because
The value of this approach is that it combines all data access happens through the APIs, no
the power of running complex mathematical proprietary data is shared with third parties, and
calculations with the easy interface of a language the built-in mechanisms for logging and access
model without compromising accuracy, which is control remain intact.
a challenge in using LLMs.

56
Introduction: The Time for Industrial DataOps is Now

Chapter 5

The Industrial Data


and AI Problem
58
Chapter 5: The Industrial Data and AI Problem

The State of Data Drowning in Data,


Starving for Context
Unstructured data types are also increasing with
more reliance on images, videos, acoustic, 3D
models, pointclouds, and engineering drawings

Liberation and Data


to provide additional context on the state of
A typical industrial facility can have 100,000 operations. Traditional efforts to connect data
data points continuously updating, some faster from systems are manual, time-consuming, and
than once per second, across more then 50 fail to manage structured data at scale, much

Contextualization
applications. less incorporate the growing unstructured data.

For every one person who can Open, composable workspaces with integrated
‘speak code,’ there are hundreds AI copilots will become the point-of-entry to
engage with industrial data in the same way the
of others who do not, especially web browser replaced desktop applications.
in the industrial environments The infusion of generative artificial intelligence
where there are numerous (AI) accelerates the ability to converse with
contextualized, trusted data via a performant
data types and source system
API—without writing a single line of code.
complexity. To use industrial data
broadly, it requires context. Generative AI also thrives on context. While
generative AI has tremendous potential,
Subject matter experts (SMEs), field engineers, answers are often wrong without data that is
and data scientists deserve simple access to contextualized in an industrial knowledge graph.
all industrial data in a single workspace. This LLMs such as ChatGPT are trained on 10-100s of
requires a unique way to leverage and apply billions of parameters, yet data sets of this size
contextualized data. Industrial applications, as we don’t exist for industrial environments. If a pre- Enterprises large and small are rushing to reduce The need to understand industrial data is also
know them today—especially data dashboarding trained model is integrated with raw data in a data the barriers their workforces must overcome to relevant to the growing demand to apply generative
and visualization—will soon be 100% transformed. lake, the patterns aren’t readily identifiable to the consume data—or to be more data literate. AI in these environments, where large language
model. Using generic LLMs on uncontextualized, models (LLMs) like ChatGPT lack the industrial
unstructured industrial data significantly increases Gartner formally defines data literacy as, “the ability context required to provide deterministic, trusted
the risk of hallucinations. to read, write and communicate data in context,” responses unique to each facility.
more informally expressed as, “Do you speak data?”
To enable generative AI solutions to provide the Data literacy includes understanding data sources
right answers in industrial environments, there and constructs, analytical methods and techniques
must be an efficient way to provide generative applied to data, and the ability to describe the use
AI solutions with more context. case applications and resulting value.

60 61
Chapter 5: The Industrial Data and AI Problem

A Modern Approach Most importantly, liberating data from existing


to Liberating Industrial Data industrial sources must not require a custom
extractor development for each new source.
Addressing this issue requires pre-built extractors
Reducing the burden of liberating industrial data into the most common industrial data sources.
requires an edge-to-cloud approach to integrate Connecting to new source systems must take
IT, OT, or engineering data for prioritized use cases. a maximum of hours, not months, and the
In the past, people were forced to ‘speak data’ to These questions concern the SMEs and field The edge solution must be able to collect data configuration must enable users to schedule
gain actionable industrial insight. Now, we have engineers responsible for keeping equipment from legacy controllers, siloed equipment skids, frequency (from real-time streaming to daily
progressed to make industrial data speak human. running and continuously operating at a level IoT sensors, and more to address the fragmented updates) with full visibility into the state and
We can liberate complex industrial data and put that optimizes short-term and long-term efficient OT systems. performance of the connections. The connections
it into context for the people and equipment who production. Increasingly, documentation and must be stable, fully supported, and able to be
need it. electronic trails of maintenance and operational The cloud solution must be able to integrate monitored by an owner in the event of a disruption.
history can be accessed digitally. this edge data with existing OT data sources All of these capabilities are prerequisites for having
Similarly, AI investments must speak to your (historians, MES, SCADA, quality), IT sources (ERP), trusted data that is available for contextualization.
people about your industrial data. Those who However, diagrams and vendor documentation and engineer data (3D, point clouds, P&IDs, etc.).
can already provide simple access to complex are still found in static PDF files, maintenance
industrial data deliver more business value with records are scanned paper documents, and 3D
an ROI above 400% and are well-positioned to use models are not up to date with the physical world
large language models to unlock new business as modifications and maintenance have occurred
opportunities rapidly. over the asset’s lifetime. How do we enable SMEs
across operations, maintenance, and quality to
While the urgency for subject matter experts solve their problems with digital tools?
(SMEs) to become more data literate is clear,
they face the industrial data and AI problem. To adequately empower SMEs to extract the value
The physical world of industrial data is a messy of industrial data, operationalizing data must
place. For example, equipment wear, fluctuating become a core part of your business strategy.
operational targets, and work orders are all Data must be liberated, contextualized, and easily
important when assessing the root cause of used to run accurate AI models and generate
an operational issue. A process variable cannot data-driven insights. A robust data foundation
be observed and interpreted in isolation. Is the powered by a strong contextualization engine
variable trending as expected? What is the is crucial to optimize production and enhance
maximum value recommended by the vendor? operational efficiency.
When was this equipment last inspected, and
what were the observations? The foundation will enable operational teams
to better work with data and improve their work
efficiency. It will also become a solid foundation
to run generative AI models that accelerate
workflows, optimize production, automate
repetitive tasks, assist in decision-making,
and more.

62 63
Chapter 5: The Industrial Data and AI Problem

It’s All About Relationships relationship—some of the techniques are simple


pattern matching, others can rely on being able
Contextualization of data means to uncover and to understand specific data formats, having a lot
identify relationships between elements of data of domain knowledge, and an ability to identify
that are connected, but where the relationships patterns that are not exact matches. The value of
are not explicitly represented. Identifying data contextualization is the automated discovery
these connections can be done in many ways of real-time information.
depending on the type of data and the kind of

Given the high-stakes nature of these industries Google-like search, enhanced by generative AI,
and the sheer amount of operational questions to make faster decisions, ensure safer working
being asked, operators in these environments conditions, and improve asset reliability and
deserve to easily navigate all related data in a resiliency of operations.

Take Google Maps, for instance, which effortlessly information and quick answers to questions. The Although consumer and industrial technologies that drive business value. To answer operational
combines map data with information from the user can then navigate the data in context to are different, the approach is quite similar. Like questions, users can interact with contextualized
web (rating, direction, hours of operation, nearby take an informed action. Google Maps, industrial solutions must put information with a Google-like search, a 3D view,
activities, busiest times) to provide users with structured and unstructured data, real-time or a drawing (P&IDs or process flow diagrams)
a unified interface to quickly answer multiple But what about upstream energy production, and historical, into business context, enabling and, through any starting point, quickly arrive at
questions from a single, seamless experience. In downstream and process manufacturing, hybrid anyone to build, deploy, and scale digital solutions the information they need.
many cases, Google Maps often anticipates user and discrete manufacturing, power generation,
queries even when incomplete and provides helpful and other industrial environments?

64 65
Chapter 5: The Industrial Data and AI Problem

Time spent on data products Industrial data, in its siloed form,


fails to unlock today’s business opportunities

Working with contextualized data, the day-to-day


decisions of the user become faster and more
data-driven. Unfortunately, today’s reality for industrial search and increase cybersecurity risks and costs.
is too complex. SMEs spend hours looking for and Even where data sources are connected within
Instead of SMEs spending 80% of time finding and interpreting data they need to tell an asset-centric a data lake, data often needs more context due
aggregating information, contextualized data flips narrative only to learn that the data is incomplete, to limited documentation or information loss due
the script and empowers end users to spend less inaccurate, or that the decision no longer matters. to inconsistent structure or tagging. As such,
time gathering data and more time focusing on This time is spent looking through reports and data lake solutions cannot establish sufficient
making better-informed decisions with the help spreadsheets, engaging with other data owners, relationships between all data in the lake, making
of generative AI and copilots. or making new requests to IT—none of this work it usable by only a few experts (not SMEs) who
happens through a single pane of glass. know how to navigate the data lake effectively
and increasing the likelihood of hallucinations
While many previous attempts have been made from generative AI solutions.
to solve the industrial data and AI problem using
solutions such as data lakes, they often fail. To truly solve the industrial data and AI problems,
These data solutions present disadvantages, data must be liberated from siloed source systems
including chaotic data governance, privacy and put into context so it can be used to optimize
issues, and inability to integrate data and share production, improve our asset performance, and
data changes that can delay time to value, enable AI-powered business decisions.
increase the gap between SMEs and IT teams,

66 67
Introduction: The Time for Industrial DataOps is Now

Chapter 6

A Robust
Contextualization
Engine
68
Chapter 6: A Robust Contextualization Engine

Data from the industrial knowledge graph is twin use cases, and providing an open workspace
deterministic, trustworthy, and accessible through to generate insights with contextualized data
a robust and performant API. These characteristics (Cognite Data Fusion’s Industrial Canvas). Each
make the industrial knowledge graph necessary of these topics will be discussed in detail, but
for enabling generative artificial intelligence (AI) first, we will start with how to contextualize data.
use cases, modeling data for open industrial digital

Data
Contextualization Industrial Canvas
Your teams deserve the industrial data & analytics

as a Foundation
UX only available with Industrial Canvas
■ Simple access to complex industrial data in one
collaborative workspace
■ Use AI to develop and analyze complex cross
data-source scenarios 90% faster

for Innovation
■ Trouble shoot with real-time data and collaborate
on RCA across teams and even sites

Open Industrial Digital Twins


& Industrial knowledge graph
Represent our full enterprise as data graph with
Open Industrial Digital Twins

Contextualization means ■ Build dynamic models of physical assets and


processes, delivering real-time intelligence for
establishing meaningful active decision making

relationships between data sources ■ Use data product practices to provide simple
access with strong governance across source,
and types to traverse and find data domain, and solution data models across your
ecosystem
through a digital representation
■ Take advantage of LLMs with Retrieval-Augmented
of the assets and processes Generation (RAG) powered by your own proprietary
data, without data leakage
that exist in the physical world. Continually contextualizing disparate data sources
is an iterative process that creates a rich data
foundation for operational innovation.
Scalable Industrial
As relationships between previously siloed data DataOps foundation
sources are established in this data foundation, Supercharge your data engineering with Industrial
you naturally start building an industrial knowledge DataOps
graph tailored to your operations. ■ 100+ OT, ET, IT, Simulator, and Robotics connectors
out-of-the-box
The knowledge graph continuously evolves and ■ Connect from edge to cloud with confidence,
spans many dimensions and data types, from with granular access controls, data lineage, and
quality assurance built-in
the time series values to diagrams showing the
■ Catalog and activate you data across source,
process flows to a point cloud 3D model with domain, and solution data models across your
recent images from an inspection. enterprise

70 71
Chapter 6: A Robust Contextualization Engine

Building Contextualization Pipelines ■ Manually contextualized data is impractical


to maintain across hundreds of thousands
Data contextualization involves connecting all of mappings, and system changes are only
the data to better understand an asset or facility. captured through more manual efforts.
Data relationships must be created through
contextualization pipelines that develop and Contextualization pipelines must be able to
maintain a comprehensive, dynamic industrial connect all OT, IT, and engineering data types for
knowledge graph. a clearer understanding of an asset or facility and
to reuse the industrial knowledge graph across
Using contextualization pipelines addresses two many business solutions and business domains.
key factors:

■ Manual attempts to contextualize data


are time-consuming upfront, whereas
contextualizing data requires thousands of
upfront hours to complete.

72
Chapter 6: A Robust Contextualization Engine

Achieving contextualized data at scale within a services that use pre-trained ML-based models, Entity Matching signals, a data contextualization engine provides
single site and, more broadly, across an enterprise custom ML-based models, a rules engine, and enormous value by organizing, structuring, and
requires establishing many contextualization manual/expert-sourced mappings. Entity matching is the swiss-army-knife of governing a Subject Matter Expert’s (SME)
pipelines through AI-powered contextualization contextualization and matches string data intensive data contextualization work. For
properties, such as equipment names, example, spreadsheets and CSV files are no longer
descriptions, metadata, etc. needed to map relationships manually. Moreover,
generative AI further enhances the accuracy
Examples of contextualization pipelines For example, you can match assets with time series of these models by improving and ensuring the
to related work orders to their particular nodes in accuracy of entity matching models to better
a 3D model. Matching signals must be present for understand the semantic relationships between
different data sets to be successfully matched and the previously siloed sources.
appended into the reference and application data
models. The entity matching model uses AI to find
matches when there are similarities between the
strings and does not return suggested matches
for unrelated entities. Even with weaker matching

AI-powered contextualization permeates industrial ■ Entity matching — map and connect time
data management, shifting the emphasis from series, events, and tabular data to assets.
data storage and cataloging to a true human
data discovery experience, assisted further by ■ Create interactive diagrams — map and
a generative AI copilot. connect tags on engineering diagrams (P&IDs
or PFDs) to other data sources.
The complete scope of contextualization
capabilities to capture all structured, unstructured, ■ Contextualize imagery data — upload and
and semi-structured industrial data extends to extract information from imagery data.
the following types of contextualization:
■ Create document classifieds — label training
data and create document classifiers.

74 75
Chapter 6: A Robust Contextualization Engine

Contextualize events to assets


E.g. connect shop orders, work orders, and alarms,
to correct site, prod. line and equipment

Example of how “entity matching” is leveraged in a contextualization pipeline


to automate the formation of the industrial knowledge graph
Create Document Classifiers

We can enrich documents such as standard


operating procedures, inspection rounds, or OEM
manuals with more context through connections
to data sources such as Sharepoint and local
file directories. This contextualization will
transform static pages into dynamic, interactive
documents with live links to assets or processes
identified in a document. Plus, contextualization
with generative AI enables natural language Contextualized documentation
document search, enabling users to quickly find E.g. connect tags in P&IDs to associated assets,
the correct information without manually scanning time series, work orders, and documents etc.
documents.

The ability to automate the creation of


contextualization pipelines with AI-based
algorithms across all of these different data types
shortens the data contextualization process from
months to days. Using AI-based algorithms to
build an industrial knowledge graph efficiently
avoids six-plus months of upfront effort and
increases time to value. Additionally, with live
Create Interactive Diagrams Contextualize Imagery Data contextualization pipelines, the effort to scale is
significantly reduced, and data can be applied to
Contextualization capabilities extend beyond Whether captured with cameras, drones, or a solve many use cases across many sites without
entities and can also build interactive engineering mobile phone, images and video data contain increasing the size of teams to manage this
diagrams/P&IDs from static PDF source files. valuable information about the state of a facility foundation dramatically. Contextualized 3D models
Contextualization of diagrams and drawings over time (inspections, maintenance routines, E.g. filter for and visualize where work orders
discovers and validates relationships from these etc.). However, often utilizing this data remains a are located on plant
previously flat PDF files automatically without challenge. Establishing contextualization pipelines
writing any code. to extract relevant information from these data
types requires several industry-relevant and
This contextualization is achieved by identifying ready-to-use services. For example, computer
tags within the PDFs using OCR and smart vision can identify people, safety equipment,
algorithms to find the correct tags linking, e.g., spills, reading analog gauges, and much more.
to files or equipment in those files. The detected These services should exist both as APIs and
instances are then stored as annotations SDKs and can be used in automated pipelines
alongside the file and link to the relevant data, for analyzing imagery data with the power of AI.
available to navigate from the PDF to the relevant
asset or process data.

76 77
Introduction: The Time for Industrial DataOps is Now

Chapter 7

Contextualization,
Data Models,
and Digital Twins
78
Chapter 7: Contextualization, Data Models, and Digital Twins

The knowledge graph enables you to capture various data perspectives

Contextualizing Data
Into an Industrial Take, for example, someone developing a

Knowledge Graph
production optimization application across An asset hierarchy is ideal for addressing use
multiple production lines. cases related to asset performance management
(APM). The relationships resource type also allows
They need a robust domain API that provides the organization of assets in other structures
instant access to a data model, which contains besides the standard hierarchical asset structure.
all the relevant data structured to accurately For example, assets can be organized by their
reflect the process while ensuring performant physical location, where the grouping nodes in
querying, regardless of where that data originates the hierarchy are buildings and floors rather than
Contextualization is necessary from or where it is stored now. And they need it systems and functions. Building a model with
when building an open, flexible, all in domain language, not in the language of this structure opens the possibly to solve new
databases. So in this simplified example, the data use cases like product traceability, where the
labeled industrial knowledge graph model powers this domain API. physical connections of the assets through the
to represent your operations. production process must be known.
Data modeling makes it easier Simple aggregation of digitized industrial data
is a significant step forward from the silos and Data becomes an asset, liberated from its silos, with
for all stakeholders to find
inaccessibility that often plague large enterprises. reusable analytics and scalable models, shareable
the necessary information and view However, to provide simple access to complex across many users. This industrial knowledge graph
and understand relationships data, the variety of industrial data types must encourages data reuse by creating a user-friendly
between data objects. be accounted for, and the semantic relationships architecture. By leveraging data effectively and
that drive scalable utilization of this data must rapidly, the organization can address business
be incorporated to support interactive user opportunities quickly and at scale.
experiences.

Codifying this context as an industrial knowledge Not One to Rule Them All
graph is vital to enabling consistent, deterministic
navigation of these meaningful relationships. To transform operations, industrial companies must
build tens of data-driven solution and then scale
With the industrial knowledge graph as the them across hundreds of production facilities.
foundation, data is understood and structured
to meet the specific needs of users or use cases. Scalability helps enterprises break the Proof of
For example, asset resources commonly originate Concept (PoC) purgatory cycle.
from a maintenance system, and the hierarchical
asset structure of the maintenance system can
define how the asset resources are organized.

80 81
Chapter 7: Contextualization, Data Models, and Digital Twins

The different layers enable value creation Data Models Work Together
holistically and on a per-project basis. While the in a Digital Twin
source data model liberates data from various
source systems, it should also be queryable
through the same API interfaces. The domain The most prevalent application of data modeling
data model allows for a higher level of entropy is to unlock the potential of industrial digital twins.
and the representation of evolving ontologies, The advantage of data modeling for digital twins
whereas the solution data model is much more is to avoid the singular, monolithic digital twin
rigid while at the same time allowing for true expected to meet the needs of all and focus
scalability across two dimensions: on creating smaller, tailored twins designed to
meet the specific needs of different teams. The
■ The scalability of one solution is ensured industrial knowledge graph acts as the foundation
through an automated population of solution for the data model of each twin and provides the
data model instances made possible by the point of access for data discovery and application
contextualized relationships in your industrial development.
knowledge graph, e.g., scaling a maintenance
optimization solution across the entire asset Industrial companies can enhance the overall
While some companies have lighthouse sites, For this, we need a data modeling framework that portfolio. understanding of their operations by creating
where technology allows different teams to allows different perspectives of the same data to relationships across OT, IT, and engineering data
work harmoniously, achieving this performance be clearly described and reused. Data models for ■ Scalability across a portfolio of solutions is using contextualization pipelines to develop an
state commonly takes one to two years. If an industry can exist at three levels: enabled by immediate access to a wide range open industrial digital twin.
organization has 50 locations in total, do they of data sources and the fact that application
have 50-100 years to achieve the same level of 1. Source Data Model — data is liberated requirements for data are decoupled from the An industrial digital twin
performance at each of those 50 sites? from source systems and made representation in the domain data model. This is the aggregation of all possible data
available in its original state. decoupling allows for use cases to be solved types and data sets, both historical
To address this lack of speed, the industry needs that require different levels of data granularity, and real-time, directly or indirectly
an approach that combines domain and industrial 2. Domain Data Model — siloed data e.g., plant-level maintenance optimization vs related to a given physical asset
data expertise into a single product, enabling data is unified through contextualization enterprise-level strategic planning. or set of assets in an easily accessible,
reuse to develop many tailored solutions rapidly. and structured into industry standards unified location. The collected data
Data modeling is a core component of turning (CFIHOS, ISA-95, etc.). As a result, enterprises can break free of the PoC must be trustedand contextualized,
siloed data into scalable solutions. purgatory cycle and focus on use case innovation linked to the real world, and made
3. Solution Data Model — data from that delivers in production at scale. consumable for various use cases.
Physical, industrial systems are complex to the source and domain models
represent, and no single representation will work that support specific solutions. Digital twins must serve data to align with the
in all the different ways to consume the data. The operational decision-making process. As a result,
solution to this complexity is standardizing on a companies need not a single twin but multiple
set of data models that contain some of the same twins tailored to different decision types. For
data but allow you to tailor each model and add example, a digital twin for supply chain, one for
unique data. various operating conditions, one for maintenance
insights, one for visualization, one for simulation—
and so on.

82 83
Chapter 7: Contextualization, Data Models, and Digital Twins

Industrial Generative AI operational decisions or working on use cases


with data scientists and data engineers. While
Requires Data Contextualization
contextualization helps solve the complexity
of industrial data for users, we must not
Contextualized data shortens the time to business
underestimate the impact of providing industrial
value and provides a pathway to scale in many
context to generative AI solutions.
industrial performance optimization applications
and across advanced analytics workstreams.
Our early exposure to LLMs through ChatGPT
Additionally, access to contextualized data
and Dall-E made the industry quickly realize the
allows Subject Matter Experts (SMEs) to become
incredible generative capabilities of this latest
more confident and independent when making
advancement in deep learning AI.

The above graphic shows that a digital twin isn’t a “Data management platforms
monolith but an ecosystem. What is needed is not help firms streamline the data
a single digital twin that perfectly encapsulates
all aspects of the physical reality it mirrors but capture, handling
rather an evolving set of ‘digital siblings’ who share and contextualization
a lot of the same ‘DNA’ (data, tools, and practices)
processes, while accelerating
but are built for a specific purpose, can evolve on
their own, and provide value in isolation. time to value, cloud
deployment, and scalability.”
To support that ecosystem, industrial companies
need an efficient way of populating all the different (Verdantix Green Quadrant Report)
kinds of digital twins with data in a scalable way.

Contextualized data unlocks valuable insights for


operators, increasing understanding and improving
operations processes. With contextualized data,
industrial companies find it easier to examine Broadly, AI is accelerating contextualization, auto- The ability of AI to accelerate contextualization
their assets across multiple levels, from individual populating data models, and simplifying the use of and to populate data models has been discussed
sensors to complex models. Equipped with virtual data for those who know how to code and those throughout this book. The final piece of a more
representations of real-world assets reflecting who don’t. All of this results in accelerated time intuitive data experience is the benefits of
real-time data, operators can identify and mitigate to value, improved operations, and an even easier generative AI.
problems that have been present but invisible way for SMEs, operators, and data scientists to
for decades. consume and work with data.

84 85
Chapter 7: Contextualization, Data Models, and Digital Twins

Q&A with a CTO


Jason Schern, Q: What are some specific risks Q: Many businesses feel pressure to

Field CTO at Cognite companies aren’t yet considering when


incorporating generative AI models that
haven’t undergone extensive testing?
board the generative AI train. What are
some key considerations companies
should weigh up before investing?
Which companies/industries are better
Jason: There are a couple of critical aspects of placed to invest in developing their
large language models (LLMs) that are important own generative AI products, and which
to recognize: companies/industries should outsource?
Jason has spent the last 25 years
working with some of the largest ■ They are trained on a large corpus of text, Jason: This is highly dependent on the goals
including web content, books, software source and profile of the business. For businesses in
discrete manufacturing companies code, etc. By its very nature, this content creative pursuits or content generation, jump
to dramatically improve their data includes context, which significantly impacts on the generative AI train with confidence and
operations and machine learning an LLM’s ability to make predictions when a high sense of urgency.
‘reading’ new content or generating content
analytics capabilities.
in response to prompts. Generative AI is designed and optimized for
content generation and creative outputs,
■ LLMs are probabilistic and will return responses especially as manifested in some of the most
consistent with both the original training and recent LLMs. These businesses can outsource
reinforcement learning used to build the model their investments (i.e., fully leverage publicly
and any inputs provided for consideration available LLMs).
when responding to a prompt.
However, for many other businesses and
As a result, it is not the level of testing that drives industries, accuracy and validation of responses
the risk profile, but the amount/type of context. might be critical to ensuring operational efficiency
and safety. These entities will need to invest in
For industrial data and industrial use cases, generative AI pursuits that can better leverage the
the risk profile is determined by the level with high-level reasoning of generative AI to provide
which a general-purpose LLM (i.e., Open AI, Bard, trustworthy, deterministic responses.
For the past year, much of his time has been spent etc.) can return a deterministic response based
talking directly to operators or manufacturers on highly contextualized information. When a This need for trust typically leads to approaches
about how generative AI can and should be general-purpose LLM has access to highly where contextualized industry-specific data
incorporated into their digital transformation contextualized information, the ‘reasoning can be provided as inputs alongside prompts
strategies. Here, we cover some of the most engine’ can significantly reduce the probability to generally available LLMs—or specialized LLMs
common questions he gets from operations and of hallucinogenic responses to provide both can be fine-tuned utilizing highly contextualized,
IT teams in the field. deterministic and verifiable responses. industry-specific data sets for additional training.

86 87
Chapter 7: Contextualization, Data Models, and Digital Twins

Because these approaches depend on proprietary Some obvious examples can be experienced Q: Numerous conversations are
industrial data, these companies will be unlikely today in areas such as autonomous driving happening around how generative AI
to outsource information that can become a or flight (Tesla/SpaceX), where multi-modal should be regulated. Do you think about
competitive advantage. generative AI joins photogrammetry, telemetry, how it should be handled and which
and other technologies to outperform humans in areas pose the greatest risk? What
Q: Generative AI’s adoption via ChatGPT performance and safety. Other examples will come considerations should organizations
has been unprecedented—where do we from medicine and materials science, where new bear in mind regarding the potential
go from here? Are we headed in the right treatments and new materials will be discovered. impact of government regulations
direction? Which industries possess the and standards on their future utilization
most significant potential for harnessing In industry, we’ll see tremendous advancements in of generative AI?
benefits from generative AI? operational efficiency and safety as we combine
generative AI with robotics, photogrammetry, Jason: I may be a bit of a contrarian on this point.
Jason: We are still very early in the adoption and other optimizations that previously were The ‘cat is out of the bag,’ and any attempts at
curve of generative AI. ChatGPT is an excellent limited to the rare subject matter expertise of regulation will be unenforceable in a world where
showcase for the capabilities of generative highly experienced professionals. these models are already so ubiquitous across
AI to the broadest possible audience with a the globe.
non-existent learning curve due to the natural Yes, we are headed in the right direction.
language approach to generating prompts. We However, the speed and breadth of innovation Any regulation attempt would either need to be
will begin to see compounding effects where are so unprecedented in technology adoption globally enforceable (impossible) or run the risk of
generative AI and other technology curves will cycles that it will be challenging to follow these placing limits on investments in specific countries.
lead to exponential advancements in most fields. advancements and prepare for the implications.
The list of possible directions is too extensive to The implications of regulatory limitations would risk
address here, but the most significant impacts It is difficult at this early stage to identify where slowing or limiting advancements that will shift to
will go far beyond chatbots. It’s the application of the greatest potential lies. other countries. The national security implications
generative AI with other technologies that, once of falling behind in these advancements will make
combined, will yield the greatest innovations. However, companies that can organize and it highly unlikely that any meaningful regulations
contextualize proprietary data will significantly will be passed. However, I do feel it likely that
benefit from the innovations and efficiencies major players (Open AI, Microsoft, Google, etc.)
generative AI will enable. will begin to seek legislation that provides some
‘regulatory capture’ to protect their positions.
The eternal adage of ‘garbage in, garbage out’
not only applies to generative AI, but it is also the
greatest determinant of the extent to which a
company can successfully leverage the integration
of generative AI capabilities into their business.

88 89
Chapter 7: Contextualization, Data Models, and Digital Twins

Q: In your opinion, what is the biggest data from machinery provides constant Q: What are the most significant
misconception about generative AI? information through a timestamp and sensor headwinds and tailwinds you anticipate
value. There is no context inherent in this will affect generative AI over the next six
Jason: The two main misconceptions—at least data. The machine this sensor belongs to, to 12 months?
at this early stage—are: the work orders that have previously been
performed, operating conditions, operating Specifically in industry, the availability of high-
1. Experiences based on ChatGPT can feel like throughput, maintenance history, and other quality, contextualized data will be the most
LLMs understand text. They do not. LLMs critical contextual information are not included significant barrier to utilizing generative AI’s ever-
expose statistical probabilities resulting from in the sensor data. growing capabilities and innovations. Those with
the data context and fine-tuning used during it will out-pace their rivals in what generative AI
model training. This dependency on context All of this context is critical and either needs to can provide. Tesla is an obvious example of this
necessitates such a large corpus of data for be utilized in training a specialized generative that we can see now. Their access to high-quality,
generic model training. AI model or included as input when using a contextualized data from their customers’ cars
general-purpose AI model. Context and data will allow for advancements and innovations that
When this is understood, we realize that LLM quality matter more than ever. other automakers are unlikely to replicate.
performance in providing non-hallucinogenic
responses is greatly impacted by the quality The cost of compute (availability of GPUs and cost/
of the data and the level of context inherent prompt) will be a limiting factor in many use cases.
in that data. This challenge is temporary, and I expect market
dynamics will resolve this in six to 12 months,
The better the data (context-rich), the less is but we will not be able to ignore these throttling
required for model training (significantly lower effects early on.
costs for model development or fine-tuning),
and the greater the chances for generative And lastly, I suspect there will be many failed
AI to provide more deterministic, verifiable attempts to leverage generative AI in various
responses. use cases. Many of these endeavors will suffer
from not understanding what generative AI is
doing and what it needs to succeed (high-quality
2. The second misconception is that generative contextualized data). We will see many failed
AI can understand complex industrial data just attempts that have at their core a level of wishful
like text, books, web content, programming thinking not grounded in technology.
languages, etc. This is also not true. There
are several critical differences between To avoid this pitfall, start by solving your data
industrial data and text-based content, but problem before jumping straight into generative
the most significant difference is the lack of AI initiatives (and note; these can be done in
context. Contextual cues primarily influence parallel!).
the statistically driven models of generative
AI. These contextual cues are inherent
in the content itself. Industrial data lacks
these contextual cues in the data. Sensor

90 91
Introduction: The Time for Industrial DataOps is Now

Chapter 8

AI is the Driving
Force for Industrial
Transformation
92
Chapter 8: AI is the Driving Force for Industrial Transformation

From Digital Maturity While nearly every organization is talking about


digital transformation, the use of data, scaling,
and time to value, only some in the industrial

to Industrial
world are actually reaping the benefits.

There is no shortage of data in any industrial


company, but there is a general lack of

Transformation
understanding on how to extract it, bring it
together, and use it in a valuable way. At the heart
of this data-driven value dilemma lies a confluence
of challenges, ranging from the technical (“How
can we best organize our diverse and fluid data
universe?”) to the operational (“How can we create DataOps focuses on delivering business-ready,
new information products and services?”), to the trusted, actionable, high-quality data available
financial (“How can we treat data as an asset?”), to all data consumers.
There are two discomforting truths 1. Digitalization proof-of-concepts (PoCs) are to the human (“How can we improve data literacy
within digital transformation across commonplace. Real Retun on Investment and ensure digital solution adoption in the field?”). ■ The challenge: Only one in four organizations
(ROI) isn’t. extracts value from data to a significant
asset-heavy industries: Though more and more industrial operations data extent. Data dispersion and a lack of tools
2. Organizations invest billions in cloud data is readily available in the cloud, Chief Data Officers and processes to connect, contextualize,
warehouses and data lakes. Most data ends (CDO) are confronted with the hard reality that and govern data stand in the way of digital
there, unused by anyone for anything. moving data to the cloud is not even a third of transformation.
the journey to value. As Forrester (2021) put it
elegantly, “Data has no value unless the business
Value capture reality trusts it and uses it.” ■ The opportunity: Industrial DataOps, infused
Despite supportive technology trends in abundance, data activation remains stuck in PoC purgatory. with AI, promises to improve the time to
Data and analytics leaders need to capitalize value, quality, predictability, and scale of
on the value of data fully. However, many aren’t the operational data analytics life cycle. It
sure how. So what is required to fix the industrial provides the opportunity to offer data science
data problem? Industry has put much effort liberation within any product experience
into new digital tools and capabilities to create while simultaneously allowing subject matter
free information flows from their vast amounts experts (SMEs) to acquire, liberate, and
of industrial data. But the data systems and codify domain knowledge through an easily
mechanics are not even connected enough to accessible and user-friendly interface. It’s
approximate Google-like search for front-line staff. a stepping stone to a new way of managing
data in the broader organization, enabling it
The convergence of data and analytics has made to cope with growing data diversity and serve
Industrial DataOps an operational necessity. a growing population of data users.

94 95
Chapter 8: AI is the Driving Force for Industrial Transformation

Industrial DataOps As Verdantix recently put it, “DataOps- Rapidly maturing generative AI technologies
and AI for Industry focused vendors are uniquely placed are poised to change how humans process
to harness the power of AI/ML complex information or situations. While SMEs
technologies such as LLMs for asset- and operators are getting used to new data
While generative AI can help make your data heavy industries—taking advantage of infrastructure that helps them ‘Google’ complex
‘speak human,’ it doesn’t speak the language industry-specific data ingest, transform, industrial data, AI will further simplify the human-
of your industrial data on its own. As discussed governance and contextualization tools to-data interface.
in previous chapters, a strong industrial data to provide a unique grounding plane
foundation is required to remove the risk of for hallucination-prone LLMs.”5
hallucinations. Generative AI can improve Industrial DataOps and Data Management Priorities for 2023
data usage, but trustworthiness requires a Industrial DataOps platforms help data workers How important are the following data management activities for your firm over the next 12 months?
robust data foundation with a powerful deploy automated workflows to extract, ingest,
contextualization engine. and integrate data from industrial data sources,
including legacy operations equipment and
technology. It offers a workbench for data
quality, transformation, and enrichment,
as well as intelligent tools to apply industry
knowledge, hierarchies, and interdependencies
to contextualize and model data. This data is
then made available through specific application
services for humans, machines, and systems to
leverage.

Industrial facilities generate more data than ever


with greater volume, velocity, variety, and visibility.
Organizations are changing their approach to data
and operations to meet digitalization priorities.

Gartner expects that by 2024, the use of synthetic


AI Can Deliver Untapped Value
data created with generative AI will halve the
volume of real data needed for machine learning for Asset-Heavy Enterprises
and that by 2026, code written by developers/
humans will be reduced by 50% due to generative To adequately extract the value of industrial data
AI code generation models. insights, it’s essential to make operationalizing
data core to your business strategy. Data must
Digital mavericks have already noted what is in be available, useful, and valuable in the industrial
their best interest and are aligning their strategic context. Operational teams need a robust data
technology purchases with their enterprise foundation with a strong data context and
transformation strategy. interpretability backbone, all while applying
generative AI to accelerate workflows that optimize
5. www.verdantix.com/insights/blogs/cognite-s-new-interface-gives-industry-its-first-real-glimpse-of-generative-ai-s-value → production and make operations more efficient.

96 97
Chapter 8: AI is the Driving Force for Industrial Transformation

Efficient Data Management understanding data goes from hours in traditional Generative AI further improves ML training structures, with better operations data quality,
and Improved Data Accessibility tools to seconds. Now users can spend more time sets of ML models by generating synthetic integration and accessibility, and stewardship. It
driving high-quality business decisions across data, enhancing the data set used for training, should also enhance data security, privacy, and
production optimization, maintenance, safety, enhancing process efficiency, and optimizing compliance with tracking, auditing, masking, and
A strong data foundation is required to remove the and sustainability. production. Some common use cases in asset- sanitation tools.
risk of ‘hallucinations’ and increase AI ‘readiness.’ heavy industries are maintenance workflow
An Industrial DataOps foundation maximizes the optimization, engineering scenario analysis,
productive time of data workers with automated Rapid Development of Use Cases digitization of asset process and instrumentation
data provisioning, management tools, and analytic and Application Enablement diagrams (P&IDs) to make them interactive and
workspaces to work with and use data safely shareable, and 3D digital twin models to support
and independently within specified governance asset management.
boundaries. Too often, digital operation initiatives get trapped
in ‘PoC purgatory,’ where scaling pilots takes
The approach can be augmented with AI- too long or is too expensive. Use an AI-infused Enterprise Data Governance
based automation for various aspects of data Industrial DataOps platform to shorten the time as a By-Product
management—including metadata management, to value from data by making PoCs quicker and
unstructured data management, and data cheaper to design and offering operationalizing
and Personalized AI tools
integration—enabling data workers to spend and scaling tools. These copilot-based approaches
more time on use case development. Using AI leverage the power of natural language to By having a strong Industrial DataOps foundation,
to enable rapid ingestion and contextualization understand and write code based on published you can then empower users to adapt AI models
of large amounts of data brings a paradigm shift in API documentation and examples to support to cater to their specific requirements and tasks,
how the organization accesses business-critical development processes. using generative AI to enhance data onboarding,
information, improving decision-making quality, complete with lineage, quality assurance,
reducing risk, and lowering the barriers to (and and governance, while a unique generative AI
skills for) data innovation. architecture enables deterministic responses
from a native copilot.

Augmented Workflows On top of that, Industrial Canvas overcomes


and Process Improvements, Driving the challenges of other single pane-of-glass
solutions, which often over-promise capabilities
Innovation at Scale and are too rigid with prescribed workflows. This
prevents users from working with the data how
Using generative AI-powered semantic search, they choose, by delivering the ultimate no-code
what used to take your process engineers, experience within a free-form workspace to derive
maintenance workers, and data scientists hours cross-data-source insights and drive high-quality
of precious time will take only a few seconds. production optimization, maintenance, safety,
With the guidance of generative AI copilots, and sustainability decisions.
users can generate summaries of documents
and diagrams, perform no-code calculations on If implemented successfully, an AI-augmented
time series data, conduct a root cause analysis of data platform provides consistency and ROI
equipment, and more. Time spent gathering and in technology, processes, and organizational

98 99
Chapter 8: AI is the Driving Force for Industrial Transformation

Democratizing Data: Why it Matters to Executives


Why AI-Infused DataOps Matters
to Each Data Stakeholder Priorities Challenges Value

Financial performance ROI on investment, cost Enable data-driven decision-making that allows focus
Extracting maximum value from data relies on and profits of downtime on the highest ROI activity at any given time

applying advanced models to produce insights


that inform optimal decision-making, empowering Innovation The measures and KPIs used Serve as a bridge to increased digital maturity,
and solution delivery in decision-making as it carries forward the momentum and infrastructure
operators to take action confidently. Turning
are entrenched in showing to develop data analytics catalogs and libraries that
insight into action is, in a nutshell, what we mean short-term value can then be deployed with fewer services and at lower
by operationalizing data into production for value. marginal costs

But for every one person that can ‘speak code,’ Reduce inefficient Scaling on assets Align and bring together formerly isolated subject
there are hundreds who cannot. processes that waste and equipment; projects matter experts (SMEs), cultures, platforms, and data
time and effort are slow to deploy and done deployed by IT and OT teams to improve operational
in isolation performance through unified goals and KPIs
Generative AI will change how data consumers
interact with data. It facilitates a more collaborative
Impact reputation, Poor reputation Empower organizations to operate with greater
working model, where non-professional data users
sustainability and sustainability precision, and track the correct metrics to reduce
can perform data management tasks and develop and future viability the impact on the environment
advanced analytics independently within specified
governance boundaries. This democratization of Having the best Scaling the number of people Enable existing non-professional data users
data helps store process knowledge and maintain workforce creating solutions; lack to perform some data management tasks, and drive
technical continuity so that new engineers can of skill set value for the enterprise, while preserving their valuable
quickly understand, manage, and enrich existing accumulated knowledge and experience

models. It is about removing the coding and


scripting and bringing the data consumption Remaining relevant Missing the digital Leverage data effectively and rapidly to answer
and competitive transformation revolution questions and gain some insulation from market
experience to the human user level.
volatility

Making the data speak human is the only way to


Enabling digital Confusion about how to deliver Enable the organization to meet the need for fast-
address the Achilles heel of practically all data and transformation digital transformation and what moving innovation by providing consistency and ROI
analytics solutions, especially those for heavy- it means for both management in technology, processes, and organizational
asset industry verticals. These organizations face and workers structures, with better operation data quality,
many challenges; an aging workforce, extreme data integration and accessibility, and stewardship

type and source system complexity, and very low


classical data literacy among SMEs; those needing Strategic change Reactive culture is an obstacle Recognize data as an enterprise asset and build a path
in culture and vision to growth towards digital maturity through tools and processes so
data to inform their daily production optimization,
that digital ways of working become effortless across a
maintenance, safety, and sustainability decisions. broad range of stakeholders

100 101
Chapter 8: AI is the Driving Force for Industrial Transformation

Why it Matters to IT and Digital Teams

Priorities Challenges Value Solving the industrial data problem is critical in


realizing value from industrial digitalization efforts.
Data preparation Legacy systems. Complex Minimize existing data silos and helps simplify
Benefits can be measured from streamlined APM
and integration integrations and dependencies the architecture to support rapid development
between multiple data sources and deployment of new analytics
workflows, improved SME productivity, optimized
maintenance programs, and real-time data
Create solutions There’s too much data, Offer a workbench for data quality, transformation, efficiencies, such as:
for operations with no context; challenge and enrichment, as well as intelligent tools
by turning insights in making data usable to apply industry knowledge, hierarchies, ■ Productivity savings due to improved SME ■ Optimized planned maintenance. Cognite
into actionable advice and getting models in operation and interdependencies to contextualize and model data
efficiency. DataOps provides data accessibility Data Fusion® creates contextualized
Automate workflows Scale across insights, solutions, Facilitate industrial equipment and processes data
and visibility, transforming the way data data to optimize planned maintenance by
sites models and templates—it talks domain language scientists and SMEs are collaborating. analyzing and interpreting available resources,
and scales models from one to many workflows, and component life cycles.
■ Reduced shutdown time. The opportunity
Minimize the hurdles Multiple hand-offs that are error Provide a unified approach to streamline end-to-end cost of large, industrial assets being out of ■ Energy efficiency savings. Intelligent data can
in cross-functional prone and increase risks workflows from data preparation to modeling
production is significant. Using a digital twin be used to reduce energy use and therefore
collaboration and insights sharing
and better component data visibility, SMEs operational costs.
Driving Difficult to explain business- Connecting data users with disparate operational data are able to safely minimize shutdown periods
implementation wide benefits to senior decision sources helps bridge those divides on the path when data anomalies arise. ■ Optimization of heavy machinery and
of Industry 4.0 makers to use-case operationalization industrial processes.
■ Real-time data access enables improvement
in productivity. Live data access enhances ■ Health and safety. Reduce the amount
operational flexibility and decision making of human movement through potentially
Why it Matters to Domain Experts by increasing site safety, improving dangerous ‘hot’ areas, reducing risks to
predictive maintenance, and raising machine employee health and safety.
Priorities Challenges Value performance.
■ Environmental, social, and governance (ESG)
Optimization Lack of insights or tools to make Leverage domain knowledge and human expertise
reporting.
of process around quick and correct decisions to provide context and enrich data-driven insights,
quality, throughput, around maintenance and further develop machine learning models that utilize
and yield and production data to improve planning processes and workflows

Inefficient processes Work in isolation and without full Provide the capabilities domain experts need
waste time and effort possession of all the data to support self-service discovery and data
and the facts orchestration from multiple sources

Resource management Reluctancy and slow adoption Empower business functions to use data and supports
of new tools and tech in old ways the digital worker
of working

Improve planning Manual tasks are Industrial DataOps infused with AI enables your site
process and workflow time-consuming and prone to solve today’s challenges with a pathway towards
across assets to errors more autonomous systems and sustainable growth
and equipment

102 103
Introduction: The Time for Industrial DataOps is Now

Chapter 9

Use Cases
for Industrial
Generative AI
104
Chapter 9: Use Cases for Industrial Generative AI

Take, for example, a copilot approach. Because


LLMs such as ChatGPT understand and can
generate sophisticated code in multiple languages
(i.e., Python, JavaScript, etc.), users can prompt
the LLM with a site-specific question. The LLM
can then interpret the question, write the relevant

Copilots:
code using an open API, and execute that code
to return a response to the user (see the below
image for example).

Enhancing Self-Service
and Humanless Workflows

Generative artificial intelligence (AI) and again—contextualized industrial knowledge


is changing how data consumers graphs provide the trusted data relationships that
enable generative AI to accurately navigate and
interact with data, but can interpret data for operators without requiring
it completely eliminate data engineering or coding competencies.
‘human contextualization’?
With generative AI-powered semantic search, what
used to take process engineers, maintenance
Whereas a process engineer might spend several workers, and data scientists hours of precious
hours performing ‘human contextualization’ (at time now takes only a few seconds.
an hourly rate of $140 or more) manually—again

These copilot-based approaches leverage the leverage the power of the LLM’s natural language
power of natural language to understand and write processing in conjunction with proprietary
code based on published API documentation and data. This database can include numerical
examples. This level of automation is impossible representations (embeddings) of specific asset
with data lakes or data warehouses where, without data, including time series, work orders, simulation
a contextualized industrial knowledge graph, results, P&ID diagrams, and the relationships
there are no API libraries that can be used as a defined by the digital twin knowledge graph. Using
reliable mechanism to access rich industrial data. these open APIs, a prompt can be sent to the LLM
along with access to a proprietary embeddings
Industrial data can also be provided, in context, database so that the LLM will formulate its
directly to the LLM API libraries available from response based on the relevant content extracted
OpenAI, LangChain, and others, allowing users to from your proprietary knowledge graphs.

106 107
Chapter 9: Use Cases for Industrial Generative AI

Industrial Canvas: operators should be able to make cross-data source


Interact with Industrial Data insights available without relying on data scientists,
data engineers, and software engineers to build
in a Modern, Open Workspace tailored case solutions. As a result, this enables
users at every level of the organization to spend
For every one person that can ‘speak code,’ there less time searching for and aggregating data and
are hundreds who cannot. more time making high-quality business decisions.

Users deserve a new way to interact with industrial With an industrial knowledge graph making data
data, one that delivers an ultimate no-code available in an open workspace, users can view
experience within a free-form workspace to derive all data types in one place, including documents,
cross-data-source insights and drive high-quality engineering diagrams, sensor data, images,
production optimization, maintenance, safety, 3D models, and more—data that matters to
and sustainability decisions. industrial end users, including process engineers,
maintenance planners, reliability engineers,
With an intuitive, user-centric tool revolutionizing machine operators, technicians, and others.
data exploration and visualization, SMEs and

Users can generate documents and diagram This concept, captured as a product, is a perfect
summaries, perform no-code calculations on time example of how contextualized data generates
series data, and conduct root cause analysis of immediate business value and significant time-
equipment with the guidance of the copilot. savings in many industrial performance optimization
applications and across advanced analytics
SMEs deserve to work with their industrial data workstreams. Access to contextualized data
in a collaborative environment where they can allows Subject Matter Experts (SMEs) to become
share workspaces, tag other users, and share more confident and independent when making
insights as comments. This approach overcomes operational decisions or working on use cases
the single pane of glass solution, which often with data scientists and engineers.
over-promises as a framework. Plus, this approach
is often difficult to collaborate with and too rigid,
preventing users from working with the data how
they choose.

108 109
Chapter 9: Use Cases for Industrial Generative AI

App Development

Problem: Developing industrial applications is Solution: Prompt engineering, or using


time-consuming and costly. It is challenging NLP to discover inputs that yield the most
to get the right data streams that provide the desirable results, to replace many software
right insights to end users in a scalable way. applications and bypass the traditional
Real-World Examples Gen AI-Powered Data Exploration application development cycles.
from Cognite
Problem: Domain experts don’t have a deep
Let’s look at other examples of AI applications understanding of their data models and how to
for industry. query across data types (‘relational queries’).
Solution description
Solution: A natural language as an alternative
to traditional search, wherein the expert can
Interact with a copilot to refine a search and
get clear feedback to find the data they need
for day-to-day tasks quickly.

Document Insights

Problem: Field workers need to understand


equipment specifications buried in huge ISO Compliance
documents.
Problem: Industrial companies must comply Solution: Copilot for simplifying the compliance
Solution: A copilot that summarizes the with numerous standards to ensure regulatory process for industrial companies through
document’s relevant paragraph(s), along with compliance and manage operational GenAI and vectorization.
links to document locations, to accelerate risks, requiring significant resources and
decisions. coordination of siloed knowledge.

110 111
Chapter 9: Use Cases for Industrial Generative AI

Resource Utilization Solution: A state-of-the-art AI tool for dynamic Rust Detection


resource allocation that uses previous
Problem: Efficient resource utilization is resource utilization (memory, CPU, etc.) Problem: Rust can corrode and degrade Solution: Generative AI-powered Autonomous
crucial for businesses relying on Kubernetes. data to forecast resource utilization in the equipment, machinery, and pipelines, Operations incorporating robotic inspection
Traditional resource prediction methods do immediate future. The solution can learn from leading to reduced functionality, decreased rounds with computer vision and GPT-4
not capture the dynamic nature of Kubernetes the workloads and make intelligent scaling performance, and, in severe cases, complete analysis for early rust detection and automated
workloads, resulting in suboptimal resource decisions and actions to allocate resources failure. work order generation.
allocation and increased cost. dynamically for k8s workloads.

112 113
Chapter 9: Use Cases for Industrial Generative AI

In summary, every user can benefit from Data contextualization empowers everyone with With the guidance of a generative AI copilot, users understanding data goes from hours in traditional
contextualized industrial data—from a rich, improved decision-making to enable operational can also generate summaries of documents and tools to seconds. Now users can spend more time
intuitive data exploration for SMEs, production improvements, immediate business value, and diagrams, perform no-code calculations on time driving high-quality business decisions across
managers, business analysts, and engineers to a significant time-savings in many industrial series data, conduct a root cause analysis of production optimization, maintenance, safety,
‘data as code’ experience for data scientists who performance optimization applications, asset equipment, and more. Time spent gathering and and sustainability.
prefer SDK experiences. Creating an industrial performance management, root cause analysis,
knowledge graph by contextualizing data is a and advanced analytics workstreams. Data
new way of serving all data consumers—data contextualization is equally vital for scaling use
and analytics professionals and SMEs, business, cases, which is critical to transforming operations.
and engineering professionals—with the same
‘real-time contextualized data at your fingertips’
experience.

With the continued loss of domain-specific


knowledge caused by the aging workforce, various
data types and source system complexity, and
painful user experiences amongst SMEs, data
contextualization is the only way to address the
Achilles heel of all data and analytics solutions.

Creating this experience is predicated on the In the next two chapters, we look at how
ability to contextualize data rapidly at scale contextualization, industrial knowledge graphs,
with AI-powered services that eliminate tedious and generative AI are driving innovation in robotics,
manual contextualization and maintenance. autonomous industry, and asset performance
Contextualization enables the industrial management.
knowledge graph that serves as the connecting
Simple access to complex industrial data will be exploration, visualization, and analysis for process fabric between data modeling, digital twins, and
achieved through an ultimate no-code experience engineers, maintenance planners, reliability all open components. Combining an industrial
within a free-form workspace to derive cross-data- engineers, machine operators, technicians, knowledge graph with a unique generative AI
source insights—a collaborative environment to and others, to build specific use case workflows architecture, all users have simple access to
share workspaces, tag other users, and share without relying on data scientists, data engineers, complex industrial data. In their language. On
insights as comments. The goal is to provide an and software engineers. their terms.
intuitive, user-centric tool that revolutionizes data

114 115
Introduction: The Time for Industrial DataOps is Now

Chapter 10

The Impact
of Gen AI on Robotics
and Autonomous
Industry
116
Chapter 10: The Impact of Gen AI on Robotics and Autonomous Industry

Robotics in industry is nothing ■ Lack of adaptability: Autonomous systems are


new. Early industrial robots were typically designed to operate within specific
parameters or pre-defined scenarios—like
typically programmed using adapting to novel environments or dynamic
pre-defined instructions These robots are equipped with advanced sensors, operational requirements—and deviations
and could not adapt or learn artificial intelligence algorithms, and machine from these conditions can pose challenges.
learning capabilities. They can perceive their
from their surroundings.
environment, interpret data, and make intelligent ■ Limited decision-making in complex
In recent years, intelligent decisions to perform complex tasks with minimal scenarios: Traditional rule-based or pre- ■ Enhanced decision-making: Generative AI
robots—capable of performing human intervention, expanding their application programmed approaches may not encompass analyzes complex data patterns and generates
complex tasks and adapting in industry. the intricacies of complex decision-making insights based on learned patterns. It can
processes, limiting the autonomous system’s consider a wide range of factors, assess risks,
to dynamic environments—have For example, it is relatively common at this point to ability to respond appropriately in dynamic and generate optimal responses in complex
become much more common. see a robot like Boston Dynamics’ Spot navigating and uncertain environments. scenarios, enabling more intelligent decision-
through a manufacturing plant’s premises, making.
following predetermined or dynamically adjusted ■ Lack of contextual understanding:
routes to conduct inspection rounds. It can record Autonomous systems may need help ■ Improved contextual understanding:
images, videos, and audio snippets of critical understanding the broader context or Generative AI’s ability to learn from human
assets and process the collected data in real time interpreting ambiguous information, collaboration, especially when combined
to identify signs of corrosion, leaks, abnormal particularly when it comes to tasks that deal with RAG technology, enables autonomous
temperature readings, or other indicators of with unstructured data. systems to interpret and respond to human
equipment malfunction or safety concerns. instructions or queries more accurately. This
Generative AI has emerged as a crucial enhances the contextual understanding of
More advanced deployments combine robotics component in achieving fully autonomous autonomous systems, making them more
with digital twin technology. Autonomous operations. It directly addresses the above capable of handling diverse and non-routine
inspection rounds deliver precise measurements by enabling adaptability, enhancing decision- situations.
of geometric attributes, identification of structural making capabilities, and improving contextual
changes, and visual documentation of an asset’s understanding:
condition, which is the foundation for building
and updating digital twins. ■ Adaptability through learning and
generation: Generative AI learns from vast
The continuous data collection by robots allows for amounts of data, including diverse scenarios
real-time monitoring and detection of deviations and edge cases. It can generate new data,
or anomalies, reducing unplanned downtime, simulate different situations, and refine its
optimizing maintenance schedules, and facilitating capabilities through continuous learning.
data-driven decision-making. This enables autonomous systems to adapt
and respond to evolving conditions, enhancing
However, even with these advancements, truly flexibility and robustness.
autonomous operations face certain limitations
that can impede its widespread adoption and
effectiveness:

118 119
Chapter 10: The Impact of Gen AI on Robotics and Autonomous Industry

By leveraging the power of generative models, This iterative learning process brings robots closer computer vision to analyze the valve position, Generative AI also facilitates the development
autonomous systems can become more resilient, to achieving higher autonomy and operational heat map of the piping, etc. Stein can do all of this of collaborative robotics, where humans and
versatile, and capable of operating effectively in efficiency. remotely; allowing him to determine the most likely robots work together towards a common goal.
complex and dynamic environments. This paves cause of the temperature fluctuation and alert By generating insights and recommendations,
the way for the broader deployment of autonomous In a simple yet poignant example, Cognite’s maintenance to the most likely solution. Most generative models can assist humans in decision-
technologies across various industries, driving Co-Founder and Chief Solutions Officer, Stein notably, after multiple tests of this solution, the making, optimize task allocation between humans
innovation, efficiency, and safety. Danielsen, developed a ChatGPT-powered copilot copilot learned from previous interactions with and robots, and enhance the overall efficiency
solution built on top of Cognite Data Fusion® to Stein and began to recommend the best process of collaborative work environments.
For example, generative AI, coupled with other AI monitor the temperature of a coffee machine in for troubleshooting the maintenance issue.
techniques such as computer vision and natural our Oslo office. It is also exciting to note how well AI robotic
language processing, enables robots to perceive We can easily extrapolate the benefits of this type solutions scale. As soon as one robot learns
and interpret their surroundings, communicate The solution alerts Stein to any inconsistencies of solution beyond a simple cup of hot coffee. something new, all other robots in the system
with humans, and make context-aware decisions. to the temperature. He can then use voice- Generative AI allows subject matter experts immediately know it too. When we use domain
recognition to quite literally talk to the solution to find and solve problems more effectively. experts to train the AI that teaches these robots,
Furthermore, generative models enable robots to check temperature trends, review a schematic When enriched with contextualized data from we have found a way to infinitely scale human
to simulate and predict outcomes in complex of the plumbing connected to the coffee machine, an industrial knowledge graph, generative AI knowledge. The robot will be the embodiment
environments. By generating synthetic data and and inquire about different parameters to can serve as a cognitive extension for robots of the AI and the AI can also use the robot to see
running simulations, robots can explore different troubleshoot the problem. He can even ask the to make them nearly independent, assisting and understand the real world.
scenarios, refine their decision-making algorithms, copilot to send a Spot robot to visually inspect humans in even more tasks, expediting decision
and improve their performance in real-world tasks. the coffee machine, using its static camera and making and root cause analysis processes, and Generative AI is fast revolutionizing the robotics
significantly improving productivity and efficiency and autonomous industry, bringing us closer to
of operations. achieving fully autonomous operations. And the
potential benefits are immense. With generative AI
as a catalyst, asset-heavy industries are poised to
achieve higher levels of automation, productivity,
and innovation, redefining how products are
designed, produced, and delivered.

120 121
Introduction: The Time for Industrial DataOps is Now

Chapter 11

The Impact
of Gen AI on Asset
Performance
Management
122
Chapter 11: The Impact of Gen AI on Asset Performance Management

90% of Asset “Industrial software


applications and

Performance
associated asset content
that monitor asset
performance, predict

Management failures, and synchronize


with IT and OT systems
to generate insights that

is about Simple help optimize production,


reliability, maintenance,

Access to Data
and environmental KPIs.”

Asset Performance Management (APM) software


is critical as industrial companies develop a safer,
more efficient, and more sustainable industrial
future. Companies have invested in APM solutions Five market segments converge to form current asset management landscape
to address business essential drivers of value,
namely: Historically, the solution for solving APM use cases
has been to invest in siloed systems (e.g., ERP
■ Extending asset life and maximizing labor systems), niche solutions (e.g., IoT applications),
productivity. or run multi-year ‘lighthouse’ projects with little-
to-no ROI to show after months or years of data
■ Reducing cost and time of maintenance. wrangling and deployment effort.

■ Increasing uptime and minimizing unplanned Other firms remain cemented in traditional
events. paper-based or Excel-reliant processes, with
data tucked away in filing cabinets or their digital
cousins—data warehouses. Across the enterprise,
collaboration is hindered, value is left unrealized,
and organizations are never truly data-driven.

124 125
Chapter 11: The Impact of Gen AI on Asset Performance Management

The APM market has evolved from maintenance- Asset Performance


related tasks to incorporate performance Management Intelligence:
optimization, reliability, and environmental KPIs.
(Verdantix, Green Quadrant)
A Unified Approach
to Achieve Continuous Improvement
At the core of these issues is the industrial data
problem: cross-department data silos, lacking APM achieves continuous improvement via
context from undocumented data, inconsistent establishing a continuous cycle of reliability,
workflows, and a lack of successful cross-site operations, and maintenance:
scalability. Simply put, most APM strategies
don’t scale because people lack simple access ■ Improving asset reliability and health
to complex industrial data. monitoring: Surface and monitor functional
reliability to inform risk-based decision-making.
When people spend 80-90% of their time searching,
preparing, and governing data, there’s little time ■ Optimizing activity planning and maintenance
left to invest in gaining better, data-driven insights. workflows: Ensure plan stability and
APM initiatives will continue to fall short of their maintenance process optimization.
ROI expectations without a coordinated and
collaborative data management strategy. ■ Performing efficiently executing work in the
field and capturing new data: Efficient and safe
You may have succeeded with some strategies field execution to complete the operational
and technologies you have implemented, or even picture.
have a star site where your maintenance, reliability,
and operations work in harmony. However, this
probably took you one to two years to get to a
specific performance state, and let’s say you
have 50 sites in total—do you have 50-100 years
to achieve the same level of performance for
each of those 50 sites?

If it takes six months to deploy one solution at one


facility, how long will it take to deploy the same
solution across all assets and all sites?

One of the main challenges of APM is to account


for the messiness of real-world industrial data
and the diverse range of use cases and users it
must serve.

126 127
Chapter 11: The Impact of Gen AI on Asset Performance Management

Asset-heavy organizations face numerous workers, APM excellence means instant access
challenges when it comes to optimizing APM to data so they can conduct maintenance on
solutions, however, as it must serve different equipment based on its actual condition, as
needs and use cases for a variety of asset-heavy opposed to running its maintenance program
organizations. Reliability engineers need solutions on a set schedule. And your operations teams
that enable them to accurately monitor equipment should be empowered to operate the ‘digital plants
health and receive early warnings about potential of the future’ with digital worker and robotics APM intelligence is only achieved when the full
failures to avoid breakdowns. For maintenance applications. value from these investments is scaled across
an enterprise—not just with a single application
or a one-off deployment. APM success depends
APM is a set of use cases to improve asset performance, on solving the industrial data problem, and that’s
ESG & sustainability, and optimize production why the best approach to APM is one that places
an open industrial digital twin at its core.

A roadmap to adopt intelligent operations

Inspired by Gartner

An Open Industrial Twin


for Asset Performance Management

To make sense of this data and use it to optimize The open industrial digital twin provides the
asset performance, organizations need to invest backbone to deliver the myriad of tailored, data-
in an open industrial digital twin that can capture driven solutions that must be developed and
the inherent entropy and ever-changing nature scaled across users, production lines, and sites
Source: Verdantix, Green Quadrant: Asset Performance Management Solutions 2022 of physical equipment and production lines. to harvest the value potential.

128 129
Chapter 11: The Impact of Gen AI on Asset Performance Management

This digital twin also provides the necessary Generative AI Copilots human-to-data interface: the dawn of industrial
context for utilizing generative AI to provide for Asset Performance Management copilots is upon us.
deterministic responses to APM-related queries.
So what are the implications of this technology on
An open industrial digital twin is an aggregation Rapidly maturing generative AI technologies are legacy asset performance management processes
of all possible data types and data sets, both poised to deliver a step change in how humans and in live operational context?
historical and real time, related to a given physical process complex information or situations. While
asset or set of assets in a single, unified location. subject matter experts (SMEs) and operators are Generative AI copilot experiences, powered by
getting used to new data infrastructure that helps a robust data foundation, will transform how we
This digital twin provides a single, trusted data them ‘Google’ complex industrial data, LLMs such work in industry.
foundation for operations and analytics across as Open AI’s ChatGPT will further simplify the
assets and facilities. It ensures industry experts
spend seconds, not hours, to find and understand
the data they need. Digital twins can codify domain Example use cases for generative AI copilots in Asset Performance Management
expertise, continuously enrich your industrial
data foundation, and leverage new technologies
for better data capture and safer operations.
And an open, extensible digital twin allows your
organization to work with preferred industry
standard tooling and visualization software,
along with trusted partners.

By bringing all data into a single repository and


contextualizing it, companies can rapidly build and
deploy APM solutions to improve the performance
of their assets and the people who operate them.

Organizations investing in an open industrial


digital twin prove value from day one as they
take advantage of out-of-the-box solutions and
capabilities, connectors to existing systems, and
an ecosystem of partners. Digital twins provide the
means and the confidence to achieve the goal of Workers in the field, either in the context of Traditionally, this information would have to be
an autonomous, self-improving industrial future. performing a maintenance activity (e.g., replacing printed before leaving the office and physically
a pump) or doing an operator round (e.g., an area taken into the field. However, as often happens
check), often need to access documentation with fieldwork, discovering the need for new
about the equipment they are working with. Based information requires another trip to the office
on what they are doing, this could be a start- and additional pieces of paper.
up procedure for the pump you’re working with
or something as narrow as the pump’s design Now let’s look at the future of APM.
temperature from the OEM.

130 131
Chapter 11: The Impact of Gen AI on Asset Performance Management

By photographing the equipment they inspect, workers with a comprehensive and holistic view Such simple access enables SMEs to make
a generative AI-powered copilot system utilizes of the equipment’s condition. This capability informed decisions and perform accurate
Optical Character Recognition (OCR) and image empowers field workers to make accurate inspections without requiring extensive manual
segmentation technologies to identify the specific assessments, take appropriate action, and searching or reference materials.
equipment. This breakthrough feature saves address potential issues before they escalate.
valuable time and eliminates manual data entry, The seamless integration of image comparison Additionally, field workers can interact with the
ensuring a seamless and intuitive user experience. technology streamlines the inspection process, equipment using natural language through
reducing the time and effort required to identify speech-to-text or text-based queries. With
A copilot automatically analyzes the photos, equipment anomalies manually. the power of generative AI, workers can ask
highlighting any differences or anomalies that questions such as the last pressure test date,
may have emerged over time. This automation Copilots also make bringing up relevant information the upcoming paint job schedule, or the maximum
enables field workers to quickly identify changes, for any equipment incredibly simple. By tapping operating pressure of the equipment. The system
such as the emergence of corrosion or other visual into the vast repository of industrial data stored intelligently analyzes the queries and retrieves
anomalies, facilitating proactive maintenance and in the open industrial digital twin, field workers, the relevant information, providing immediate
ensuring the integrity of critical assets. maintenance engineers, and reliability engineers answers to field workers’ inquiries.
have access to comprehensive and contextualized
By comparing current equipment images with information right at their fingertips—whether in In addition to information retrieval, these copilot
historical references, the copilot provides field the office, control room, or field. experiences enable users to fill in work reports
using natural language effortlessly. The copilot
will automatically populate the relevant fields in
the report templates, whether inputting a report
manually or utilizing speech-to-text capabilities.
This seamless integration of AI technology
eliminates the need for manual data entry
and ensures accurate and efficient reporting,
enhancing the overall productivity of field workers.

By combining the power of image recognition,


natural language processing, and seamless
integration with the industrial knowledge graph,
SMEs can perform their APM-related tasks more
efficiently, with improved accuracy and reduced
administrative burden.

Automated operator rounds performed by robots,


drones, and other new technologies will play a
crucial role—freeing up valuable time for SMEs
from performing simple routine tasks and reducing
their time in dangerous areas.

132 133
Chapter 11: The Impact of Gen AI on Asset Performance Management

It’s critical to find the right balance between can simultaneously deploy a robot into the field
investments in hardware automation (e.g., for equipment inspection or repair—no need to
robots), software automation (e.g., AI and machine interface with multiple applications or write a
learning), and the skills of your human workforce. single line of code. Automation and remote work
It’s not about replacing all people with robots; becomes as simple as typing a sentence.
rather, it’s about finding the appropriate balance
to draw upon the unique capabilities that each Once again, it’s not about replacing the human
brings to the conversation. workflow 100%, but rather making the 80% of the
work that solves 80% of the most frequent and
With ChatGPT serving as the translator, people valuable challenges much simpler.
can do powerful work with robots and drones
that would previously have been in the realm of
developers just by typing in their intent. From the
safety and comfort of their desks, an operator can
troubleshoot an issue with their copilot, which

“If humanity continues the chosen course of movement,


it is apparent that the value of humans themselves will
constantly and steadily decline. In many areas humans may
someday become completely unnecessary to each other,
and most importantly, machines will not need them either.” 6

As we start to empower SMEs in asset-heavy


industries with this kind of intelligent tooling, you
begin to believe that we could be—at long last—at
the tipping point of superhuman augmentation.
What happens when people with decades of
domain expertise and real-world equipment
knowledge can self-service and perform orders
of magnitude more efficiently? That could very
well be the inflection point of a new industrial
revolution.

6. Gartner, Maverick Research: Preservation of Human Functions in the Age of AI. 8 May 2023. ID G00776866 →

134
Introduction: The Time for Industrial DataOps is Now

Chapter 12

The Next Phase


of Digital Industrial
Platforms
136
Chapter 12: The Next Phase of Digital Industrial Platforms

Whether you are a digital maverick And Just Like That, “Using the traditional
or about to become one, the iPhone Moment Arrived approach is too costly.
now is the best time to work The solution must work
Generative AI is set to transform every class of
in digital transformation. Seriously. software application and platform. There is now a
as part of this emerging
clear market gravity towards a new industrial data enterprise platform
We’ve come a long way since the early years of of variations of digital twin platforms. However, we ecosystem model that supports the whole range
industrial IoT data management. Pioneered by remain very early on our journey. As a result, we of cloud data and AI workloads, from operational
and at an intuitive level
operations technology incumbents focusing on on- commonly pay the cost premium of complexity and transactional data to exploratory data science for those using it.”
premise hardware and software solutions for Levels and immaturity at every turn. to production data warehousing (exemplified by
0-3 (see the illustration below), later established as the rise of cloud data historians). Market gravity
first ‘standard’ for the platform approach defined that places the transformative power of generative
by Gartner’s Industrial IoT Platform Magic Quadrant AI at the core of this emergent new industry
in 2018, and most recently brought to the cloud era cloud paradigm—one that makes intuitive user
by numerous IIoT Platform look-alikes in a myriad experience a first-class citizen.

The Purdue reference model The quantified economic value


Forrester, “The Total Economic
Impact™ of Cognite Data Fusion®”,
2022.

Benefits estimated based


on interviews with Cognite’s
customers.

The value of digitalization is well understood, cloud + consultants-led digitalization platform


even well documented. Scaling the value initiatives have proven much more challenging
across the industrial enterprise, however, than anticipated. Great—or even good—user
remains challenging. Data remains siloed across experience remains rare.
hundreds of on-premise systems, and public

138 139
Chapter 12: The Next Phase of Digital Industrial Platforms

Industrial Cloud Platforms

As we pay witness to the iPhone era of In fact, the catalyst role of generative AI
digitalization, we foresee generative AI introducing accelerates the emergence of industrial clouds
three intertwined transformations for industry: by increasing competition and innovation, as much
as tangibly redefining the art of the possible—even
1. Data and analytics roles will no longer in fields considered impossible to permeate by
be a priority: AI will become the primary such new technology.
influence in most business decisions,
automating the intermediate human data We will see more change over the next 18 months
and analytics steps to the same extent in the diverse digital transformation platform
e-commerce has automated consumer market (consisting of industrial IoT platforms,
retail and banking. digital twin platforms, and more) than we have
seen during the lifetime of Gartner’s Industrial
2. Going from low code to no code for 90% IoT Platform Magic Quadrant from 2018 to 2023.
of all use cases: With large language
models (LLMs) instantaneously translating
between natural language inputs and all
programming languages and towards
all APIs, everyone can finally become a
software developer—or rather enjoy the
benefit of their own industrial copilot as
their personal software developer.

3. Vertical software (so-called industry


clouds) will replace more than half of all
functional enterprise software, providing
industry-focused, cross-data source Introducing the Industrial Cloud So what is an industrial cloud platform?
value traditionally achieved by extensive Powered by Generative AI
customization of general-purpose software An industrial cloud platform—or industry cloud
with costly SI projects. platform more generally—offers industry-relevant,
For most enterprises, by now, the cloud has direct business outcomes by packaging IaaS,
disrupted conventional data centers by providing PaaS, and SaaS services around a robust data
more elastic storage and compute services and integration and contextualization backbone
improved mobility. What lies ahead is the cloud (commonly referred to as an industry data fabric)
as a true business disruptor. Gartner expects together with libraries of composable business
that, by 2027, more than 50% of enterprises will capabilities such as use case templates, domain
use industry cloud platforms to accelerate their and solution data models, pre-built analytics
business initiatives. schemas and dashboards, and industry-specific
generative AI capabilities.

140 141
Chapter 12: The Next Phase of Digital Industrial Platforms

An industrial cloud platform is an end-to-end, common governance, metadata management, Finally, DIY requires IT to be involved at every
integrated set of tools to simplify and reduce the and unified access management, with a set of step, including after the project is finished,
time to production of applications for all data use services accessible by the business user. due to ongoing maintenance and support of
cases. It reduces the cost of implementation, the integrated tools. Therefore, we believe that
integration, maintenance, and management. An industrial cloud platform applies safe, secure, although DIY is an alternative, it is far more complex
and hallucination-free generative AI to offer an and costly. And while generative AI is making
An industrial cloud platform is built around an ultra-intuitive data and user experience for many using software easier, it is in fact increasing the
industry-specific data fabric—simple access vertical industry end users on the same unified complexity of developing enterprise solutions to What about the business value delivered by DIY?
to complex industrial data from one API—with collaborative platform. be secure and hallucination free. Would you be reading this if that value was a
strong triple-digit percentage?. As for Cognite
Let us examine common arguments used for going Data Fusion® customer value, an independent
with DIY vs SaaS. study by Forrester recently concluded 400% ROI
delivered.

1. DIY is cheaper
2. DIY is the only way to avoid vendor lock-in
Talking about cost alone is, of course, nonsense.
Investing $100K and not getting any return is One of the most ironic arguments around is that by
a poor investment, whilst investing $2M for going all-in with one cloud and DIYing everything
a $8M ROI is a solid financial investment. using that cloud’s proprietary services (as multi-
cloud DIY is exponentially harder), this platform
Cost is also not the license cost paid to cloud and/ strategy will protect against vendor lock-in.
or SaaS partners, but the Total Cost of Ownership The truth is, your enterprise is already multi-cloud,
(TCO) including the ‘mortar’ (people) needed to and will be more so going forward. Cisco Systems
put together and maintain the data and digital Inc.’s 2022 Global Hybrid Cloud Trends survey found
platform. Unlike with normal business SaaS, the that 82% of IT leaders have adopted a hybrid cloud
ongoing cost of the ‘mortar’ often far exceeds while just 8% use only a single public cloud platform.
the software license cost for DIY platforms.
This ‘mortar’ is, of course, part of the true cost. Same applies to your SaaS choices. Working with
select specialists gives you access to relative
It is not hard to find very large (100+ FTEs of ‘mortar’ strengths of different partners. It allows you
alone) DIY platforms across industries, making to compensate for lack of a specific capability
An Aside: 6 Myths of Building The problems with DIY are many. Many of us the cost side rather high. Are the economies by one, with that of another. It allows you to
Your Own Industrial Cloud know the amount of time and cost that goes ‘cheaper’ with commercial SaaS that does not maintain your flexibility of choice. If categorically
into integration of disparate tools, or assembling require similarly persistently high volumes of avoids waking up locked in to one costly, lower
Now is a good time to talk about the alternative to from a variety of PaaS. ‘mortar’? This is very, very unlikely. So unlikely performance solution with a very high TCO.
using an industrial cloud SaaS; DIY (do-it-yourself). in fact, that finding reasons to not choose
This is what most organizations have been doing In addition, the level of technical expertise SaaS has always focused more on points two Openness, focus on industry standards for data
on-premises for years. In fact, early adopters of required to perform the integration is higher than to seven on the list. In the more austere times exchange and data models, and placing partner
cloud also had to resort to DIY in many places as implementing an industrial cloud SaaS. ahead, this alone should speak volumes. ecosystems—including frenemy ecosystems—at
proven SaaS alternatives did not yet exist. the center of customer value delivery are not

142 143
Chapter 12: The Next Phase of Digital Industrial Platforms

related, this more and more means positive


product user experience (UX) differentiation,
not technical features. Getting DIY UX to stand And, of course, all enterprise SaaS comes with 6. DIY allows me to play a strategic role
out above SaaS alternatives who are able to enterprise grade security. With all likelihood, your
learn and iterate based on feedback from enterprise’s customer data as well as employee Meeting business needs today—and anticipating
hundreds of customers—whilst your teams data is already on an enterprise SaaS product— business needs of tomorrow—is strategic. Finding
learn only from your users—is a moonshot. are your equipment time series and events really agile solutions to business needs that deliver value
more confidential? quickly; are easy for business users to adopt and
The world is full of highly differentiating industrials scale; are open, interoperable, and extendable;
using the same ERP (SAP) and the same CRM and are flexible to onboard, maintain, and offboard
(Salesforce) for example. In fact, by leaving the 5. DIY gives me job security if needed, is strategic.
tools of the trade to specialists (SaaS partners
who only exist to provide these solutions), you The best means of creating job security and Focusing on customer experience is strategic.
can differentiate on what your customers actually establishing value for your department and your Being open to new ideas, and being the one to
value: on your customer service, your distribution, team is to create business impact aligned with introduce innovation (see industry cloud platforms
your ESG, your business model, and beyond. As your company’s strategic priorities and values. in point five above) is strategic.
unique to clouds. These are normative business for cost efficiency as a differentiator, go back to Delivering real business value to your business
hygiene for all of 2020s enterprise SaaS. Choosing point one above. operations is the most direct, measurable, and With all likelihood, your company is not turning
SaaS to get to business value fast does not visible way of doing so. into a software platform company, but is and will
result in lock-in. Finding yourself with a high TCO remain an energy or manufacturing company.
solution that does not meet business needs 4. DIY is the only enterprise-grade path It is also worth noting that your business Continuing to build and maintain the behind-the-
on the other hand is lock-in of the worst kind. operations do not care what the brand of the scenes, bottom-of-the-stack platform services on
Enterprise SaaS is every bit as enterprise-grade tooling is. They only care that it makes their day- which to build business value and differentiation
If, of course, you fear getting locked into a as the first-party public cloud services used to to-day lives easier, allows them to have better is a necessary evil. Strategic it is not.
SaaS vendor because your business users develop a custom-made data and digital platform. situational awareness on events in real time, and
love using their solution over DIY alternatives, The only difference is that whilst enterprise SaaS to make better business decisions. They really do So turn a new clean page and think strategically
then you should jump to points four to six. comes with comprehensive SLA and premium not care if it is DIY built or SaaS or hybrid. They about what would be best for your business
support options, with DIY, you are responsible do care that it is fit for purpose, easy to use, and regardless of sunk costs. That too is strategic.
for your own SLA and support. This may not that it makes their life easier, not more difficult.
3. DIY is necessary to differentiate sound like a seismic difference in the early
phase of a DIY journey, yet when it comes to In different times, starting a promising, multi-year,
Anchored in competitive strategy, the thesis supporting thousands of users, having a strong three-figure ($M) strategic DIY platform project
that to outperform peers one has to be user community and documentation—all table was aspirational. Today, in different times and
differentiated is true. Yet for manufacturing stakes with enterprise SaaS—warrant serious with the majority of those three-figures spent
or energy companies to pursue meaningful consideration. Enterprise grade means enterprise without much ROI to show for, focusing on here-
differentiation by building proprietary data and scale usage ready. and-now value to business is the best job security
digital platforms is missing the forest for the trees. plan there is.
Moreover, enterprise SaaS comes with coherent,
Building a proprietary data and digital platform well-researched, painstakingly tested and iterated,
is only meaningfully differentiating if it has a business-user-persona-based UX. Whilst perhaps
positive impact on the customer experience—in not yet the first on the enterprise-grade checklist,
this case, how easily can SMEs use the platform cutting-edge ‘maverick’ CIOs already place UX
to create business value. For anything technology first for good reason.

144 145
Chapter 12: The Next Phase of Digital Industrial Platforms

From Optimized Data Storage integrated set of tools to effectively manage and
and DataOps to Data Discovery control data assets with minimal integration and
setup costs, and which place the business user
and Composable Business Solutions as the primary data beneficiary rather than the
IT department.
After years of costly DIY and ‘significant assembly
required’ data management solutions—that even Having led the way on Industrial DataOps
when painstakingly assembled, feel antiquated platforms, we can summarize the evolution to
before even operationalized at any scale—there industrial cloud powered by generative AI as:
is strong argument for unified, easy-to-use, pre-

From Industrial DataOps to Industrial Cloud powered by Generative AI

From Industrial DataOps To Industrial Cloud powered by Gen AI

Data liberation solution Collaborative business innovation solution

Data engineers Industry subject matter experts

SQL and Python Natural language and point and click Industrial Canvas overcomes the challenges of Organizations need more composability in their
other single pane of glass solutions, which often solutions to extend asset life and maximize labor
Data products Composable business solutions
over-promise capabilities and are too rigid with productivity, reduce cost and time of maintenance,
prescribed workflows, preventing users from increase equipment performance, and minimize
‘It can be done’ ‘There is a template for it’
working with the data how they choose. unplanned events. Such value can only be
delivered by empowering their people to make
Consider APM investments. The challenge is that higher-quality decisions by reducing the efforts
Generative AI is redefining low-code application Industrial Canvas makes cross-data source APM solutions are often delivered in isolated required for SMEs and field engineers to create
development. It enables business solutions that insights available to everyone, at all levels of the workflows, limited in their ability to be tailored self-service insights from data.
were only dreams until now. Using a workbench- organization, to easily build specific use case for unique site needs, and not structured to apply
style UI—such as Cognite Data Fusion’s Industrial applications. This decreases time spent searching generative AI advancements. Additionally, manual
Canvas—business users can create applications, for data and increases time spent collaborating, data management and the need to customize
both analytic and workflow (including with source accelerating high-quality business decisions by applications across sites has prevented APM
system write-back using the data fabric), based 90%. solutions from scaling. Many solutions still exist
on no-code user experience. in on-prem silos, managed by different parts of
the organization, and deployed to solve one or
two narrow business problems.

146 147
Chapter 12: The Next Phase of Digital Industrial Platforms

Industrial Organizations Require


On premises
SaaS Environments (on-prem)
In an on-prem environment, organizations own and maintain the physical
infrastructure (including storage, data servers, networking equipment,
to Scale Operational Impact
etc.) hardware, and software required to host and operate their systems.
Organizations are responsible for costs to manage all aspects of the IT
While the number of companies leveraging the environment, including security, backups, upgrades, installations and more.
speed and scalability of SaaS solutions is growing, On-prem is valued most for critical use cases such as process control where
many industrial organizations still have an aversion network interruptions do not impact performance. On-prem infrastructure
to any solution that will not run on-premises or limits scalability, flexibility, and innovation. Managing and maintaining on-prem
in their private cloud. solutions require additional investments in infrastructure and resources.

In many cases these organizations believe that


on-prem or private cloud environments are the
Private cloud Private cloud is a cloud environment dedicated to a single organization,
only acceptable ways to securely store sensitive
providing them with exclusive use and control over the cloud infrastructure
data, meet regulatory requirements, maintain
and resources. It offers enhanced control, security, and customization options
control of their data, and avoid vendor lock-in.
but usually requires high upfront costs and ongoing maintenance. Private
These concerns are valid, and a SaaS solution
cloud is valued for the control, reduced hardware management, and increased
must fully address each concern while delivering
scalability over on-prem solutions. However, the infrastructure still requires
faster time-to-value and solution scalability.
dedicated IT teams to manage and allocate resources, including software
and security upgrades and updates.
Before delving into how industrial companies
can benefit from embracing SaaS, let’s begin by
defining the commonly accepted benefits and
Public cloud
shortcomings of on-prem, private clouds, SaaS, Public cloud environment infrastructure and services are shared across multiple
and public clouds. users/ organizations. The services and resources are accessible to the public
over the internet, and customers can access and utilize them on-demand.
Concerns around public clouds exist around security, especially related to
storing of sensitive data outside of an industrial organization’s infrastructure,
and time-to-value, where initial projects can take six-plus months to deliver
business value. Public clouds are valued for their scalability, cost-efficiency,
and a wide range of pre-built services. The IT environment, software upgrades
and security updates of underlying services are handled by the cloud provider.

Software-as-a- SaaS is a software delivery model in which software is accessed through the
Service (SaaS) internet/public cloud, eliminating the need for local installations. Under the
SaaS model, the software is stored on remote servers, continuously managed
and updated by the service provider, and easily accessed by customers
through web browsers, mobile applications, and application programming
interfaces (APIs).

148 149
Chapter 12: The Next Phase of Digital Industrial Platforms

It’s important to note that SaaS can be deployed


in multiple ways:

Multi-tenant Customers share the same resources with their data kept totally separate
SaaS from each other.

Single-tenant Customers do not share resources and have their own dedicated cluster
SaaS with their own software instance.

Customers do not share resources and have their own dedicated cluster
Virtual private
with a secure connection (private IP address) that is not accessible/
SaaS
findable via public internet.

Software solutions are installed inside of a customer’s private tenant,


■ Inaccessible from the public internet: This can ■ Vendor guarantees SLA and user experience:
Private SaaS giving customers increased responsibility and control over deployment,
be a requirement for some critical industries or For providers to guarantee a service level
security, maintenance, and data.
sensitive data. Both virtual private and private agreement (SLA), they must have access
SaaS make data inaccessible from the public to control the underlying services and
internet. The virtual private SaaS uses a virtual infrastructure. Private SaaS deployments can
private network (VPN), granting it a local IP be a blocker for providers to ensure their SLA
When evaluating different SaaS environments, ■ Vendor guarantees their security standards: address, and the SaaS solution runs in the is met. Additionally, because providers cannot
many have the initial reaction to deploy these SaaS providers practice comprehensive same data center as the customer’s tenant, control the resource allocation, they cannot
solutions within their own private tenant. security and privacy measures and conduct making it indistinguishable from their own guarantee an interactive user experience as
Deploying inside of one’s own tenant passes thousands of security updates daily. Managing network. other workloads running in the private cloud
responsibility and control over deployment, such a level of security within a private may impact performance.
security, maintenance, and data to the industrial SaaS will require a dedicated security team. ■ Software updates maintained by service
organization. While this may be tempting, it is Moreover, vendors are required to undergo providers: All software maintenance (updates, ■ Customer retains data ownership: This
essential to consider the following: testing by third parties to obtain major security upgrades, bug fixes, etc.) is handled by SaaS must be true regardless of the chosen SaaS
compliance certifications (e.g. ISO 27001, providers. Yet, if an organization chooses environment. An organization’s data is always
■ Deployment costs: While counterintuitive, SOC 2 Type 2, etc.). The majority of industrial private SaaS deployment, it either needs their own to share, copy, or delete as they see
the more control customers want to own the companies have not received or maintained to manage isolated software installations, fit.
environment, the higher the cost to the SaaS these certifications. upgrades, and necessary infrastructure
provider. Depending on the complexity of the expansions on its own or rely on the SaaS ■ Data is never stored outside of the
solution, a private SaaS solution may be 10- provider or partner, likely at an additional cost. customer tenant: The private SaaS is the only
20X more/year than using a multi-tenant. environment where this is fully guaranteed.
Achieving this has significant tradeoffs
regarding higher costs, effort to maintain
security, SLA, and a delightful user experience.

150 151
Chapter 12: The Next Phase of Digital Industrial Platforms

■ Security tested and audited by third parties


with compliance to ISO 27001, ISO 9001, SOC 2
Type 2, CCC+, World’s first DNV certified digital
twin. These certifications are guaranteed
when operating in a Cognite multi-tenant,
single tenant, or virtual private tenant. Cognite
cannot guarantee security standards if a
deployment were to run in a customer’s private
tenant/cloud.

■ Support for specific regulatory and industry


requirements including: NIST CSF, IEC 62443.2-
4, IEC 62443.3-2, IEC 62443.3-3, IEC 62443.4-1,
CMMC, FIPs, NERC CIP v.5, GxP.

Additionally, you want a SaaS solution that can be Our customers trust us not only because we
offered as a multi-tenant, single tenant, or virtual are compliant with the multi-tenant architecture
private tenant, all depending on the needs and and follow the bestpractices from hyperscalers
requirements of the end users. For the most critical like Microsoft, SalesForce, Google, Adobe, and
industries and most sensitive data, for example, others, but also because Cognite has domain
our customers use Cognite Data Fusion® via a expertise in energy, manufacturing and power and
At SaaS companies, such as Cognite, there are ■ Manages a single version of generally available virtual private tenant. Cognite’s virtual private renewables. We understand the importance of
hundreds of specialized services and people software. All customers benefit from new SaaS environment combines the security and providing a secure and reliable SaaS environment
that ensure the environment is stable and releases, without having to migrate versions. privacy of a private cloud with scalability and that is flexible, scalable and quickly adapts to
highly performant for customers through a SaaS Managing a single version reduces risk and flexibility of single and multi-tenant environments. growing business needs.
experience. Thus, for a typical industrial company increases the rate of innovation and new
to successfully manage a complex SaaS solution features, such as generative AI-powered In this environment, Cognite Data Fusion® runs
in their private tenant, consistent with a SaaS search. in the same data center as a customer’s private
user experience, it would require a dedicated tenant, eliminating any regional data residency
team solely focused on maintaining the software. ■ Tenants are clean from other processes, data, requirement concerns. Configured within the
and jobs with enforced governance to ensure virtual private local network, it is granted a
To provide the highest level of availability, security, that at any time users access information local IP address, exclusively to your company
and performance look for a SaaS environment through Cognite Data Fusion®, they use the and unreachable from the public internet. All
that meets the following measures: proper API channels, enforcing user access data ownership is retained by the customer
rights, security, scalability, etc. and only accessible through documented APIs
■ Maintains staging environments that are 100% and connectors. The customer always retains
in sync with the production environment for ■ 24/7 monitoring and support using specialized ownership of the data.
proper testing and validation of updates. deployment technology and operational
Environments are updated thousands of times management tools.
per week to provide necessary improvements
and optimization.

152 153
Introduction: The Time for Industrial DataOps is Now

Chapter 13

Tools for
the Digital Maverick
154
Chapter 13: Tools for the Digital Maverick

Nobody wants to be What’s the Honest State Is Everyone Aligned


in the situation where, of Your Data and Digital Maturity to the Same Software Strategy?
after months of vendor meetings, for AI?
internal alignment, tech reviews, Often, there are competing viewpoints on DIY vs
and security checks for We get it—your board is pushing hard to invest SaaS vs a hybrid approach; IT wants to build, while
Today’s digital transformation and operations in generative AI projects. But getting buy-in, the business wants to buy. Compounding this
a new software purchase, your
executives in chemical production, refining, and approvals, and operational value/success misalignment is the fact that sometimes there
executive sponsor says “I’m energy are under increasing pressure to deliver becomes easier when you incorporate your larger is no clear decision-maker. This is where your
just not buying it…” But that’s more productivity, reliability, and safety with data and analytics journey. Coupled with the digital maverick must lead and evaluate what’s
the reality for many complex fewer human resources—but more software. previous questions around use cases, by honestly in the best interest of the business. Is it more
According to recent Gartner Maverick Research, evaluating your current technical strengths important to move quickly and innovate faster
enterprise purchases today, “by 2030, 75% of operational decisions will be and weaknesses around data, you can plot a than your peers? Or is it more important to build
even with board-approved made within an AI-enabled application or process,” more accurate roadmap with the right technical from scratch and sacrifice speed for the sake of
budgets opening up for AI demanding strategic—and rapid—realignment components to either quickly build the maturity or tech stack completeness?
around technology. finesse the maturity needed to deploy AI at scale.
software and an accelerating
To help you drive alignment across your
pace of procurement due But embarking on mid-game data and technology organization, within this chapter you will find:
to the complexity of decision- investments can be quite complex and sensitive, Have Use Cases and Values
making involved. especially given existing entrenched software ■ The Digital Maturity Diagnostic: To assess
(SAP, MAXIMO, AVEVA, etc), siloed data
Been Defined and Mapped? the current state or track the progress of
management practices, and general resistance your digital maturity.
to change. Is the overarching strategy clear and Business value must lead the way when it comes
communicated? Who needs to be part of the to complex digitization projects. Rather than ■ The Digital Roadmap Template: To align
buying center? Who’s personally invested in ‘eating the elephant’ all at once, projects must priorities and get buy-in for a broader
competing strategies? What impact will these be broken down into manageable use cases digitalization initiative.
decisions have on future teams and workflows? and prioritized based on strategic near or long-
term value. Doing this with enough detail and ■ The SaaS ROI Calculator: To quantify the
In this chapter, we provide you with the tools you a product roadmap-like mindset ensures that value of a SaaS vs DIY for your organization.
need to scope and plan your digital journey, as AI-enabling software purchases are viewed by
well as define and measure success in terms the your buying center as an investment, not a cost, ■ The Value Framework: To increase your
enterprise will understand. With the following making it more straightforward to ensure buy-in chances of obtaining and maintaining internal
tools, you now have what you need to plant your and justify funding. buy-in, secure a budget, and successfully
flag as a digital maverick and design and implement articulate ROI across the business.
an innovative, future-proofed digital solution
that will enable your organization to transform ■ The RFP Guide: To ensure you account for
the way you use data. A key area that can either all critical capabilities/functionality required
kill or delay a digital transformation is when key for success.
strategic questions are not acknowledged or
addressed by the right stakeholders in the buying
center. Here are a few common ones we see often:

156 157
Chapter 13: Tools for the Digital Maverick

Assessing Digital maturity diagnostic framework

if You Are Ready Category Parameter

for Generative AI: Status of data Enterprise architecture


How integrated, flexible, and open is the
Discovery
What is the process for finding relevant data

The Digital Maturity enterprise OT, IT and engineering data? to a problem set?

Diagnostic
Operational technology & data Standardization
How evolved, structured and managed is the How well are data standards applied across
process for maintaining OT source system sites and BUs?
data?

Generative AI AI relevance & use case execution Data leakage & trust
Building organizations in support of digital maturity
Have you identified use cases where What is your plan to manage data leakage/
can look different at a company level or within an generative AI could add value to your business trust and access control when implementing
individual line of business. However, high-maturity and teams? generative AI on your private data?

organizations share some common attributes,


Industrial data context AI integration
segmented by internal versus external factors
How will you minimize hallucinations? Have How easy would generative AI technology/
(see: Factors Influencing Digital Maturity). data quality issues been identified and models be to integrate with the existing
addressed? systems?

Generated after evaluating a number of


organizations in asset-heavy industries and People and mindset Capabilities Adoption
combining the findings with third-party research, How advanced/capable is your digital How will your subject matter experts (SMEs)
organization (data scientists, developers, in the field be empowered to use new data
these five key factors of digital maturity form etc.)? and generative AI tools?
a sophisticated framework for discussion and
evaluation. Unlike other frameworks, this one Implementation Impact and value
includes information architecture and technical How do you plan to manage infrastructure What is your plan for managing the impact of
data layers; both critical—and often undervalued— implementation to integrate emerging AI generative AI on workflows and employees?
technology into your operations?
indicators of digital maturity.

We can further segment these key factors into


additional parameters that can be measured and
weighed together to transform the concept of
digital maturity into a diagnostic framework. Use
this tool to assess the current state or track the
progress of your digital maturity. By answering each question in the diagnostic on To achieve the best results and to truly have a
a scale of one to four (with one being “we have no living, breathing measure of this digital maturity
plan” or “we are not confident” and four being “we journey, we recommend reviewing the questions
have a detailed plan” or “we are very confident”), with stakeholders of various related disciplines
you can begin to establish where you are on the at least twice a year. This way, initiative and
digital maturity spectrum and zero in on where progress can be tracked over time across the
to focus your efforts for maximum return. maturity spectrum.

158 159
Chapter 13: Tools for the Digital Maverick

A very helpful reference point is Gartner’s future-state vision for manufacturing operations.
Maturity Model for Manufacturing Excellence. Cognite has taken this model and applied use
The five-stage maturity model is designed for case examples to each stage that the digital
manufacturing strategy leaders to assess their maverick can focus on to show how value can
organization’s current capabilities, articulate a first be delivered at site level and then scaled out
plan for change and support the development of a to the enterprise level for full value realization.

These tools are designed to be used together, example, if you answered mostly ones in the Digital
first by taking the Digital Maturity Diagnostic and Maturity Diagnostic, focus first on building a strong
then by mapping your results to the corresponding data foundation to address use cases within the
stage within the Gartner Maturty Model. For reacting stage of Gartner’s maturity model.

Digital Maturity Diagnostic Result Gartner Maturity Model Stage

1 Industrial DataOps / Reacting

2 Reacting / Anticipating

3 Integrating / Collaborating

4 Collaborating / Orchestrating

160
Chapter 13: Tools for the Digital Maverick

Example 1

Prioritizing Industrial DataOps Use Cases:


The Digital Roadmap Template
Historically, individual departments spearheaded In fact, all digital initiatives are connected and
digital initiatives, and each use case was benefit when built on top of the same accessible,
considered a separate project. For example, at trustworthy, and contextualized data foundation.
an oil and gas company, one team was looking Digital mavericks know this, but their challenge
into autonomous operations. At the same time, is connecting these disparate use cases into an
another team was focusing on real-time simulation actionable digital strategy.
modeling, and a third trying to optimize alert
management. These are actually connected This is where our Digital Roadmap Template
use cases, and they can build on one another comes in. Use this template to visualize your
to achieve more efficient success. At the same business strategy and make communicating with
time, sustainability initiatives may be going on all stakeholders easier. This tool is critical for
that would benefit from the data and dashboards aligning priorities and getting buy-in for a broader
the other teams are developing. digitalization initiative, not just a one-off use case.
Tips for Using ■ The blue categories represent some of the
the Digital Roadmap Template: most common focus areas across industry,
but these should be tailored to your business.
Examples of focus areas per industry are
■ The gray box on the left applies to all industries provided.
and represents the required data foundation
for any successful digitalization initiative. This ■ For each focus area, you will have several use
foundation includes the pillars of Industrial cases mapped from baseline to lighthouse,
DataOps that will accelerate digital maturity visualizing the need for a solid and unified
and enable your organization to deliver better data foundation first, and second, the order in
digital products and realize more operational which use cases must be prioritized to reach
value at scale. specific long-term objectives. Examples of
completed roadmap per industry are provided
■ The orange bar at the top applies to all for reference.
industries and represents the digital maturity
required to implement a specific use case
successfully. Baseline use cases make you
think, “why haven’t we done this yet?” and
can be accomplished early in a digital maturity
journey. Baseline uses cases also act as building
blocks to achieving lighthouse initiatives, such
as fully autonomous operations.

162 163
Chapter 13: Tools for the Digital Maverick

Example 2

To help you determine which focus areas and use ■ How predictable is your production
cases may be most impactful to your organization, and production quality? What are the
consider the following: most common causes of disruption to
production?
■ What is your operational view today? How
easily can your team find everything they ■ How does your team go from an alert to
need to make the best decisions? a resolution? How long does that take?

■ How much of your operation is manual vs. ■ What types of tools would your team build
automated? How well is that working for for themselves to help them make better
you? day-to-day decisions?

■ How does your team excel in making ■ What processes would you improve if
decisions? Where could they use more you could access high-quality data to
support? automate them?

164
Chapter 13: Tools for the Digital Maverick

Quantifying the Value of SaaS vs DIY: Defining the Impact of Industrial DataOps:
The Value Calculator The Value Framework
Now that you have defined a long list of potential Across Cognite’s customers we have seen that It is essential to have a clear framework for ■ Level 1 describes the top-level actions/
use cases and prioritized those use cases with a the Net Present Value (NPV) over a five-year measuring returns on digital investments. Value metrics which directly affect a company’s
digital roadmap, you can build your case for a DIY period for Cognite Data Fusion® can be almost frameworks are a necessary tool for executives to shareholder value; Support Revenue Growth,
vs SaaS vs a hybrid software strategy. four times higher compared to DIY, underscoring assess their business. While seemingly high-level, Reduce Operating Expenditures, Increase
the significant financial benefits of adopting if executives cannot map their digital investments Asset Efficiency, and ESG.
While pure-DIY approaches can be justified— a SaaS approach. A full-scale, enterprise-wide directly to one of the below value streams, they
especially in the early phases of emerging Industrial DataOps foundation has the potential will ask themselves whether their investment is ■ Level 2 is the high-level business challenges
technology markets where SaaS vendors may to generate hundreds of millions of dollars in actually generating real value. that are tangible and measurable but are
be too small and lacking maturity—embarking on NPV, with Cognite customers achieving up to typically indirectly improved by organizational
a DIY journey is suitable only for organizations $300-500m in value. Value stream frameworks are a representation actions.
with well-resourced IT teams and a high tolerance of how a company generates shareholder value
for risk. For most organizations, the benefits of The below value graph will help you understand and which levers it can use to do so. These are ■ Level 3 is the value levels that are tangible,
deploying a DataOps foundation via Cognite Data the total value generated based on the number specific to industries and business and should measurable, and actionable. Projects and the
Fusion®—a faster time-to-value, greater scalability, of use cases deployed and give an indication be tailored for max impact: implementation of solutions directly impact
and lower maintaining costs—make more financial of the value you can achieve to help you drive these metrics and drive improvement in the
and competitive sense. internal excitement and garner support to conduct higher-level categories.
a future value assessment based on your unique
use cases and situation.

166 167
Chapter 13: Tools for the Digital Maverick Chapter 13: Tools for the Digital Maverick

This framework increases your chances of ■ What would it look like for your team if there
obtaining and maintaining internal buy-in, securing were no roadblocks to the information you
a budget, and successfully articulating ROI across need to make decisions? What would it mean
the business. To align your digital initiatives within for the business?
this framework, you’ll want to consider how your
use case(s) impact current challenges and help ■ In your desired future state, what positive
avoid adverse financial consequences. It is also outcomes do you see from taking advantage
essential to consider your ideal future state, the of de-siloed and contextualized data in your
positive economic and operational impact, and use case(s)?
how you will measure success.
■ Who on your team and/or across the business
The below questions are designed to help you would benefit most if cross-functional
evaluate your use case through a value-framework information was more accessible?
lens. While not comprehensive, these questions
will guide you to a more value-centric mindset and ■ Where would you focus the freed-up talent
enable you to lead productive discussions about if it took less human power to deliver on your
digital strategy across the business: use case(s)?

■ What would it mean to your bottom line if


you could increase the speed of making
a decision?

■ What set of information would you include in


your optimal KPI dashboard?

■ How could better access to high-quality data


improve production quantity and quality?

■ How much of your budget covers scrap and


waste across your sites/facilities? Which
initiatives would you reallocate those funds
towards if you reduced waste?

■ How will better and faster visibility to


scope one, two, and three emissions impact
your business?

168
Chapter 13: Tools for the Digital Maverick

Finding Your Industrial DataOps


Solution: The Request Functionality
for Proposal Guide
Properly assessing Industrial DataOps software
A key challenge in implementing your own requires an understanding of two components:
Industrial DataOps initiative is defining what the foundation and connectivity. Assessing the
capabilities your solution needs to support your foundation is critical to ensure that the proposed
business. The section below provides a guideline build business cases and developing a target solution will support your industrial data use cases
to build out your RFP and ensure you account for ROI? Can you provide examples of successful and provide the tools needed to minimize time to
all critical capabilities/functionality required for business cases delivered? value, scalability, and repeatability. Connectivity has
success. This guideline will address key areas to two components—data extraction and application
consider and should be used as a starting point Expert tip: Successful Industrial DataOps layer. Data extraction will ensure that you will be
to tailor this to the needs of your organization. solutions should start with one to two use able to connect to both existing and future data
cases defined before any work begins and sources, while the application layer focuses on how
have a backlog of two to five use cases the solution provider will support applications on data quality? Are rules pre-built? Can rules
once success is achieved with initial use top of the foundation to deliver use cases. be modified? Are rules applied universally or

Industrial DataOps
cases. per use case?

■ How many existing customers do you have? Are Foundation Expert tip: Data models are designed to
RFP Guideline there past successes you can share related
to the client’s industry? ■ How does the proposed solution perform
be reused. Data quality should have the
flexibility to be applied per use case. For
data contextualization (data mapping)? Is example, different use cases may require
Use Cases and Past Successes ■ Does the proposed solution enable more it automatic or semi-automatic? Does the the same data, but using this data for
effective asset management? Can you provide solution suggest relationships to make remote monitoring of an asset will not
This section is first because industrial DataOps examples? identification and construction easy? require the same update rate as using this
must be able to deliver long-term value to your data to run an analytics model measuring
organization and you need to ensure alignment ■ What use cases have you delivered regarding Expert tip: Ideal solution should automate performance.
between your organizational goals and the unstructured data (video, 3D, etc.)? this process as much as possible or
potential solution provider. Knowing that your expanding the system to include new data ■ Does the proposed solution support
solution provider has competency within your ■ What are the most common types of use cases sources will be extremely time consuming templatization? How can applied work be
domain will de-risk the probability of under- you have delivered? and hard to manage. reused?
delivering on your expected ROI.
■ Do you have reference customers we can talk ■ How is the contextualization (data mapping) Expert tip: Templatization is a key
■ Can you provide a brief description of your with? process managed? Is it easily accessible? component to scale solutions and ensures
company, industrial business areas, main How do users make edits? your organization will avoid getting trapped
products/services, relevant expertise and ■ Can you provide a product demo? in Proof-of-Concept purgatory.
business strategy? ■ How is the data model created in the proposed
solution? How are relationships between data ■ How are notifications/messages supported in
■ Are your products/services general or specific sources managed? the proposed solution with regards to users
to the client’s industry? Can you describe associated with data, administrators, etc.?
your domain expertise? ■ What types of data formats are supported in
the proposed solution? ■ How would you describe the proposed
■ How would you describe your key product solution’s performance in regards to
differentiation? ■ How does the proposed solution support scalability?
data visualization?
■ What is your experience with helping clients
■ How does the proposed solution manage

170 171
Chapter 13: Tools for the Digital Maverick

Expert tip: As you expand beyond initial Expert tip: Access to centralized, remote
use cases, you will want a solution that is relevant-time data creates opportunities
able to scale. Industrial DataOps should for many new use cases at both the site
be able to address scale at both the site and enterprise level.
and enterprise level.
■ Does the proposed solution require plugins
■ How does the proposed solution process large such as Office, Flash, etc.?
data sets? How do you ensure the proposed Connectivity
solution can handle peak processing? ■ Is the proposed solution able to ingest both
tabular and graph-structured data without ■ How does the proposed solution support
■ How does the proposed solution support loss of information? integration with external systems and what
trending analysis of the data? How are trends are the requirements of such integrations?
visualized and reported? ■ When receiving asynchronous time series
data, how does the proposed solution handle ■ What integrations are pre-built and readily
■ Can the proposed solution analyze trends timestamping? available for data extraction? For the
in data quality and predict when metrics will application layer?
exceed predefined thresholds? ■ Is the proposed solution able to handle data
inserts, updates, and deletes by itself? Expert tip: Pre-built data extractors ■ Does the proposed solution have connectivity
■ How does the proposed solution document should exist for many open protocols and and native access to relational databases?
completeness (integrity) of the ingested data ■ Does the proposed solution support multiple advanced Industrial DataOps solutions
and ensure data is not lost in transit? modes of operation, such as batch and will have existing extractors to individual ■ Does the proposed solution have connectivity
stream-based ingestion and in-memory versus industrial solution providers such as and native access to non-relational structures?
■ How do you work with third-party vendors? persistent data storage? Siemens, ABB, and Emerson.
Which have you worked with in the past? ■ How do you ensure interfaces for data
■ Does the proposed solution follow agile ■ What are the client’s possibilities to develop exchange (such as REST APIs) are kept stable
Expert tip: While many solutions talk about development principles and how do you ensure their own applications on top of the product? and robust to changes?
openness, seeing examples of proven it is up to date on market trends and technical
solutions with third-party vendors will standards? Expert tip: Further assessment is ■ Does the proposed solution support versioning
provide confidence that you will be able needed when thinking about application for continuity so that the newest version, and
to connect your disparate data sources. ■ How does the proposed solution support development for data engineers and the previous version of data pipelines remain
compression of data and metadata? citizen data scientists. Proposed supported? Can versions be rolled back?
■ Is the front-end framework of the proposed solution providers should have pre-built
solution built on open standards? How do ■ Does the proposed solution report the source connections to well adopted applications, ■ Does the proposed solution support a layered
you support open front-end frameworks? for a data point, event and time series, and such as PowerBI or Grafana. and scalable REST API?
associated metadata for users to assess the
■ How does the proposed solution ensure that data quality? ■ Does the proposed solution provide an ■ Is the REST API stateless, enabling easy
data is processed quickly and readily makes associated SDK? What languages are caching and no need for server-side state
available time series data? ■ How are the metadata fields of existing data supported? synchronization logic?
and metadata updated? How are updates
executed and managed? ■ What types of underlying data sources do ■ Can underlying data be exported from the
you support? What connections are most proposed solution as a CSV or XLSX? How
■ How is the connection between data and common? does the proposed solution export data and
metadata made? Are they stored or linked? Can metadata in standardized formats?
metadata be linked to several data entries? ■ What is the proposed solution’s capability to
access real-time data? What are the scalability ■ Are there any limitations in the proposed
limitations to this capability? solution’s ability to extract historical data?

172 173
Chapter 13: Tools for the Digital Maverick

Generative AI Solution
■ How have you applied machine learning
(ML) solutions to solve client use cases? Are
Architecture
there any use cases that utilize a hybrid AI
(combination of physics and ML capabilities) Every organization will have unique architecture
solutions that you are able to share? requirements that should be addressed from the
beginning. The key here is to ensure that the
■ How is generative AI incorporated into the proposed solution provider is designed to meet
proposed solution? Are AI capabilities built the requirements of your existing environment.
into the backend of the product? Is there a
natural language copilot as part of the user ■ Can you describe the key components of the
interface? ■ How frequently is the AI model updated proposed solution and how they operate/
and retrained to maintain its security and interconnect?
■ Can you provide details about the training reliability?
data used to develop the AI model? What is Expert tip: Any architectural requirements ■ How does the proposed solution handle
your process for providing industrial context ■ How does your solution handle intellectual can be included here. Many organizations archiving?
to generative AI solutions? property rights and content ownership? have already made investments to
integrate OT/IT data silos into data lake or ■ How do you support edge capabilities? Do
■ How do you mitigate hallucinations within ■ How have you applied generative AI solutions data warehouse solutions. Your Industrial you offer on-premises deployments?
your generative AI solutions? to solve client use cases? Are there any use DataOps solution should leverage the
cases that you are able to share? investment into the existing infrastructure. ■ Is the proposed solution validated with the
■ How do you manage data leakage within your standards of W3C and HTML5 to enable
generative AI solutions? ■ Are there any limitations or potential risks ■ Is your software cloud native? Which vendors browser independence?
associated with using the generative AI (AWS, Azure, GCP) do you support?
■ How do you manage trust and access control solution? ■ Does your proposed solution track the lineage
within your generative AI solutions? ■ Do you support hosted/private cloud or on- of all data objects and code, showing upstream
■ Can you share your generative AI roadmap premise deployment? sources and downstream consumption?
■ Is your generative AI solution compliant and what are the most important near-term
with relevant data protection and privacy deliverables on this roadmap? ■ What is the proposed solution’s ability to ■ How does development occur in the proposed
regulations (e.g., GDPR, CCPA)? support real-time deployment? solution, introducing changes to its core
components, adding extensions etc.?
■ How does the proposed solution support
horizontal and vertical scaling? ■ How is it possible to test reconfigurations,
upgrades, and extensions to the proposed
■ How does the proposed solution offer high solution before it is put into production?
availability and how are failover procedures
handled? ■ What are the software and hardware
prerequisites?
■ How does the proposed solution support
backup and recovery procedures?

174 175
Chapter 13: Tools for the Digital Maverick

Project Execution,
Services,
and Support
Understanding how potential solution providers ■ How do you enable/support search in
implement projects will allow you to assess time to the proposed solution? Can you provide
value and a high-level roadmap for implementation. documentation?
The potential solution provider should provide
the resources to ensure continued success. Expert tip: This functionality will be very
Successful implementations require both the right valuable to save time for data engineers
technology and the right support. This section and make data discoverable for citizen
is designed to provide insights to the expected data scientists.
support for your team and organization when
adopting an Industrial DataOps solution. ■ How does the proposed solution support
documentation and how is it made accessible?
■ Can you describe the ‘Go Live’ period between
proposed solution validation/operational ■ What training programs are included and
deployment, and final acceptance/beginning offered? What is typical?
of any maintenance and support agreements?
■ How do you ensure that competence is built
■ What maintenance and support do you offer within your client’s organization?
during and after implementation?
Expert tip: Building competence within
Expert tip: Proposed solution provider your organization is an important trait to
should have a designated customer digitally maturing. Your solution provider
support representative to ensure project should be enabling these competencies.
success. Otherwise your organization runs the risk
of being in a service-based relationship
■ What does a typical project implementation with the solution provider.
process look like? What support is available?
■ What resources and support are provided
■ What level of services do you typically provide? during this period?

■ Please describe how your skilled experts ■ What standard support do you provide in
will interact with clients’ in-house experts problem resolution? Do you offer varied
to maximize the benefit from collaboration? support levels?

176
Chapter 13: Tools for the Digital Maverick

Security Usability
With the importance of security always increasing, ■ What is the proposed solution’s capability For products to be adopted by your organization,
the potential solution provider must be ready to with regard to access control? What is the solutions must be easy to adopt by the users.
meet the needs of your organization. This is not granularity? Poor usability is a leading cause of poor product
intended to be a comprehensive security list as adoption. The potential solution provider also
IT departments often have developed their own ■ Does the proposed solution support groups needs to support both data scientists and citizen
security requirements for new software products for access control? data scientists to truly make data usable. In order
and is necessary to consider when integrating to make data discoverable and usable for both of
IT and OT data. ■ Can authentication requirements be these users, the proposed solution must deliver
customized in the proposed solution? simple access to data and provide intuitive, well-
■ What is your company’s strategy for designed user interfaces that do not require
penetration testing and third-party ■ How does a user report suspicious activity strong coding backgrounds to leverage.
assessments? related to data points?
■ Are users able to navigate through different ■ How does the proposed solution handle error
■ How does the proposed solution maintain an ■ Can users be assigned special roles to fix or parts of the proposed solution without help? messaging? Are errors easily interpreted by
audit trail of all data manipulation? disapprove reported suspicious data points? users?
Expert tip: Asking for a product demo is
■ How does the proposed solution offer ■ Does the proposed solution support ISO, SOC helpful when trying to assess this topic. ■ How does the proposed solution allow users
monitoring and statistics of backbone 2 Type 2, NIST CSF, IEC, NERC CIP, or other to refine search results?
components? relevant industry standards? ■ Are users able to create and edit their own
dashboards to solve specific business needs, ■ Can users create data pipelines without IT
■ How do you ensure that the client has access ■ How does the proposed solution track the and are they able to share these dashboards assistance and without deep training in data
to its own data in the proposed solution? chain of custody? to collaborate with their team or across teams? engineering, SQL, or production processes?
Do you provide a graphical user interface for
■ How is high availability maintained for security, ■ Do users see and feel the proposed solution pipeline creation?
access, and governance of the proposed responding in real time?
solution? ■ Can users execute other tasks during the
■ How many concurrent users does the proposed execution of jobs? Are users alerted when
■ How do you support revocation of access at solution support? And is the environment jobs are complete?
both user and group level? collaborative?
■ How do you ensure fast search results are
■ When and how is data encrypted in the Expert tip: As your Industrial DataOps returned to users?
proposed solution? solution gains adoption, your organization
should be striving to increase user ■ How do users report errors, bugs, lack of
adoption to enable use case development service, and requests for new services or
throughout many departments. extensions to existing services?

178 179
Chapter 13: Tools for the Digital Maverick

Software Future
Maintenance Development
Once the solution has been implemented, Ensure that the potential solution provider’s
this section is designed to give you an roadmap is aligned with your organization’s
understanding of the required upkeep. Reliability goals. Seeing the top priorities of technology

Sustainability
is another important factor in product adoption. development will provide clarity to the product
Improvements/enhancements to the proposed direction and how your organization will be able
solution should not result in unexpected to grow with the Industrial DataOps software.
downtime, and the solution should not require
a high level of manual support to ensure proper ■ Can you provide an overview of your company’s ■ Can you provide a short-term (six to 12 months)
operation. sustainability mission and strategy? and long-term (two to five years) product
roadmap?
■ How often do you release improvements to ■ How does sustainability align with your
your products? Do you have major and minor company’s core values and business ■ What is your approach to developing new
release cycles? objectives? products and the possibilities for developing
customizations/extensions?
Expert tip: As your organization requires, ■ What specific sustainability goals has your
be sure to understand the different company set, and how do you measure

Pricing Model
management requirements between on- progress towards these goals?
premise, private cloud, and public cloud
offerings. ■ In what areas does your product have a major
environmental impact?
■ Are clients entitled to all product upgrades To date, pricing has not seen convergence across
with the base software? When are upgrades the industrial software industry. Asking the high-
required? level questions to understand the initial price
(including services) required to get started will
■ How are clients notified about both scheduled be valuable when assessing potential solution
and unscheduled maintenance/downtime? providers. In addition, Industrial DataOps solutions
are designed to scale so it’s also important to
■ How are new versions/updates managed? understand the levers of pricing when data
sources, users, and use cases start to increase.
■ Do you guarantee availability and uptime of
the proposed solution to be 99.5%? How do ■ How do you price the product? How does
you track system uptime? your pricing model support increasing use
case and product adoption?

■ What factors do you predict will be the main


cost drivers for your product and services?

180 181
©Copyright, Cognite, 2023 – www.cognite.ai →
Learn more at cognite.ai →

You might also like