Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

390 BROUGHT TO YOU IN PARTNERSHIP WITH

CONTENTS

Getting Started With •  What is Real-Time Analytics?

•  About Real-Time Analytics

Real-Time Analytics
•  Common Challenges
of Real-Time Analytics

•  Getting Started
With Real-Time Analytics

•  Conclusion

SIDA SHEN
PRODUCT MARKETING MANAGER, CELERDATA

Real-time analytics is necessary for any business that needs to make as adtech, crypto, or finance. In these industries and many others,
decisions in hours, minutes, or seconds. Implementing real-time competitive advantage is gained by reacting swiftly to market changes.
analytics requires processing high volumes of input data and matching
Meanwhile, fraud and security are a bigger threat than ever before, and
it with existing data in minutes, seconds, or even less time.
real-time analytics is essential to both detecting and preventing fraud
This Refcard aims to acquaint you with real-time analytics, where it is and security penetration. Real-time analytics has become more crucial
used, how it works, and the challenges involved. across industries where supply chains are now constantly changing
due to the dynamic geopolitical and economic environment. Often,
WHAT IS REAL-TIME ANALYTICS? everything from consumer devices to manufacturing equipment are
Real-time analytics involves processing data as soon as it comes into a instrumented with sensors and provide real-time data; analytics must
system. Analytics processes, discovers, and communicates meaningful keep up to supply correct control and telemetry to the people and
insights from data through math, statistics, and machine learning. algorithms that control and monitor these devices.
Traditionally, analytical systems process a large amount of data, often
on the scale of petabytes, and have few users running relatively large Whether it is due to industry-specific requirements or to the changing
queries when compared to transactional or operational systems. This nature of customer expectations and competitive requirements, real-
definition has changed in recent years, and analytical systems are often time analytics has become an integral tool.
user- and customer-facing analytical systems. They are also frequently
used as part of a larger workflow where the "user" is another system or
artificial intelligence solution that makes data-driven decisions.

Real time is a relative term in the sphere of analytics. Generally


speaking, data freshness is measured in minutes or seconds instead
of hours or days. The definition is subjective to the user and business
expectations. However, data that is aggregated into large blocks and
then processed can be considered "batch," and data that is processed
soon after it is received can be considered "real-time."

IMPORTANCE OF REAL-TIME ANALYTICS


IN TODAY'S BUSINESS ENVIRONMENT
Businesses changed fundamentally over the past few years. Business
operations are becoming increasingly fast paced and complex. Real-
time analytics provides up-to-the-minute insights into various aspects
of the business, enabling quick and informed decision-making,
especially in industries where market conditions change rapidly, such

© DZONE | REFCARD | SEPTEMBER 2023 1


Take your real-time
analytics pipleine-free.
Don't let pipeline management
get in the way of your insights.
Take a modern approach to
real-time analytics.

Learn more
REFCARD | GETTING STARTED WITH REAL-TIME ANALY TICS

ABOUT REAL-TIME ANALYTICS associated logic are what are called real-time data pipelines or data
Real-time analytics shares a lot in common with batch analytics, transformation pipelines.
including the need for data transformation. However, it differs in both
Real-time data pipelines also process data as soon as it arrives, almost
timeframe and implementation.
instantaneously. In order to ensure data can be queried efficiently, data
HOW REAL-TIME PROCESSING DIFFERS FROM pipelines transform the data into a more denormalized form, pre-joining
TRADITIONAL ANALYTICS and often pre-aggregating data. The data lake or data warehouse ends
Batch processing systems operate on an accumulated set of data over up containing a more efficiently queried summarized form of the data
a specified time interval. These systems usually consist of regular load in the operational system. The primary difference between batch and
processes that extract data, transform it often into summary tables, real-time analytics is simply batching.
and load it into a destination system. Often, tools like Apache Spark are
used to process the data before loading it into a destination system, REAL-WORLD EXAMPLES OF REAL-TIME
ANALYTICS USAGE
usually a data warehouse like Teradata or Snowflake.
There can be no exhaustive list of every real-time analytics use case.
Real-time analytics processes data as soon as it arrives, usually First, new innovations are happening every day as the global economy
instantaneously. Real-time systems usually include a message or event increasingly digitizes. Secondly, competition is forcing previously slow-
queue like Apache Kafka or Flink. Data is often sent to the operational moving industries to move faster and provide service on demand. With
system and transformed and loaded into the analytics system either that said, there are some places where real-time analytics is a must or
one after the other or simultaneously. The message or event queue and has a clear advantage:

Table 1

USAGE DATA REQUIREMENTS REAL-WORLD EXAMPLES

Security •  Requires analyzing massive amounts of data. According to Statistica, in 2022, there were more than 1,802 reported data
and threat compromises, which affected more than 422 million people. While there
•  To be effective, it must happen in real time.
detection are yearly fluctuations, the trend has been upward virtually since the start
of modern computing. Detecting a breach is useful even after the fact, but
shutting down anomalous activity before there is damage is crucial for
sensitive data and systems.

Fraud and •  Real-time analytics often combined with machine learning According to the NICE Actimize 2023 Fraud Insights Report, fraudulent
risk analysis and other algorithmic techniques can detect when transactions have risen 92%year over year, and the amounts are up 146%.
transactions are "abnormal" or unusually risky.
Fraud has been especially challenging to cryptocurrency firms, where major
•  Largely done in batches after the fact. incidents have helped convince a majority of the public that crypto investments
are unusually risky, according to CNBC. To combat this, crypto firms are
•  Modern payment systems are real-time, so fraud and risk
implementing anti-money-laundering (AML) and know-your-customer (KYC)
detection must be as well.
systems using real-time analytics. The crypto industry requires new database
technology as analyzing blockchains is more intensive than flat transaction logs.

Network •  Related to security and threat detection. Real-time analytics can ensure networks are stable. As a result, faulty
telemetry equipment is routed around and replaced, and traffic is moved to appropriate
•  Includes other issues such as misconfiguration, faulty
and traffic routes on redundant networks.
equipment, or traffic congestion.
monitoring
•  Can be detected and corrected.
•  Requires real-time analytics and extreme volume as the
system has to essentially outrun the network at least at
some sample rate.

Online user •  Used across industries, especially gaming and e-commerce. By understanding what users do and why, vendors can provide customers
behavior with a better experience and, ultimately, close more deals. Internet giants, like
•  Every click, mouseover, and scroll generates data.
tracking Airbnb in the US and TenCent and Alibaba in China, have implemented systems
to help algorithmically understand their users as well as provide information
to professionals.

Supply •  A real-time endeavor in recent years. In recent years, supply chains have become more dynamic and, in many cases,
chain and more risky. Gone are the days of permanent contracts like the storied contract
•  Before 2020, seeing an empty shelf at a major retailer in
inventory of yesteryear between Ford and Firestone.
a developed country was unheard of. The shelf space is
management
too valuable, and there was never a reason to be out in Now, everyone from brick-and-mortar retail to manufacturing and beyond
economies of abundance. must be aware of their entire inventory and supply chains, and be prepared to
make changes.
•  While empty slots are still costly, they are now more common.

© DZONE | REFCARD | SEPTEMBER 2023 3 BROUGHT TO YOU IN PARTNERSHIP WITH


REFCARD | GETTING STARTED WITH REAL-TIME ANALY TICS

SHOULD YOU USE REAL-TIME ANALYTICS? fast. Analytical databases have not traditionally handled updates well,
In some industries and use cases, real-time analytics is a clear-cut but this is increasingly necessary when analyzing in real time.
requirement. If you are not sure, there are four questions to answer:
REAL-TIME PREPROCESSING PIPELINE:
1. How fast is data being generated and at what frequency? THE HIDDEN COST OF RTA
−  It's crucial to first assess if your data pipeline can generate Most analytical databases do not handle join operations efficiently at
fresh data at the source. Real-time analytics becomes futile scale. Because of this, it is usually necessary to do some amount of
if the starting point is already outdated data. denormalization, pre-aggregation, and transformation before loading
data into the analytical system. These data pipelines are complex,
2. How quickly can you make decisions based on this data?
difficult to maintain, and make it hard to track data points back to the
−  Businesses reap the rewards of real-time analytics when
source. Additionally, the more preprocessing, the less fresh the data.
they can swiftly translate freshly produced data into
actionable insights. Equally crucial to the freshness of data Next, handling updates for pre-joined or pre-aggregated data often
is the rapidity of decision-making. If your business process requires re-processing any pre-aggregated or pre-joined data. Finally,
cannot drive decisions or lead action quickly, then real-time many analytics systems are full of data that will never be queried, but
analytics might not be the best fit for you. that cannot be known in advance. All of these issues make the system
costly to operate and difficult to maintain.
3. Does your business model or strategy benefit from real-time
insights? Figure 1: Real-time data pipeline
−  It is widely recognized that fresh data can be extremely
advantageous, but these benefits aren't universal. Rather
than merely focusing on the perks of real-time analytics,
why not flip the perspective? Consider what the upper limit
of data freshness is that your operations can effectively
manage or tolerate.
Streaming Events
4. Can the benefits justify the cost of real-time analytics?

−  It's crucial to determine if the advantages gained from


real-time analytics outweigh the investment it requires. Stream
processing
Implementing real-time analytics often demands specific
tools and additional resource investments. Despite the
Pre-join & Pre-aggregation
emergence of cloud-based solutions reducing some of
these costs, real-time analytics still requires significant
investments in infrastructure and skilled personnel.

If your data is generated quickly, your business makes quick decisions


and actions,, and can justify the cost, then real-time analytics may be
right for you.
Aggregated flat table(s)

COMMON CHALLENGES OF REAL-TIME


Single table
ANALYTICS Query
Real-time analytics is more challenging than batch analytics and
generally costs more. It is used where the business use case demands
or in competitive industries seeking an advantage.

DEMAND FOR FRESH, MUTABLE DATA


Real-time analytics is a delicate balancing act. Handling analytics at
Compounding these challenges, user and business expectations
scale is difficult. Doing everything in seconds is much harder. And the
frequently outpace technical innovation. Data volumes are forever
hardest part is dealing with mutable data in real time. Yet many new
rising, and new techniques require processing even more data at an
applications — from SaaS dashboards and network support systems to
increased pace. So maintaining these systems also means ensuring
gaming and finance — require real-time analytics on data that is forever
they can evolve.
changing. Everything necessary to get the data from the source to the
appropriate form in the destination system must happen increasingly To mitigate these problems, data platform engineers should select data
query engines and technology that handle joins and aggregations more

© DZONE | REFCARD | SEPTEMBER 2023 4 BROUGHT TO YOU IN PARTNERSHIP WITH


REFCARD | GETTING STARTED WITH REAL-TIME ANALY TICS

efficiently. Moreover, consider technologies that allow transformation just sit idle; it's immediately put to work. This swift and continuous flow
directly in the database, also called extract, load, transform (ELT) is what allows real-time analytics to provide immediate insights and
instead of the traditional extract, transform, load (ETL). Where possible, drive quick decision-making.
it's often preferable to adopt data lake technologies rather than black-
STEP 3: DATA PREPROCESSING FOR REAL-TIME
box proprietary technology. This allows the system to continuously
ANALYTICS
evolve as new technologies become available.
Preprocessing refers to cleaning and transforming raw data to make it

GETTING STARTED WITH REAL-TIME ready for analysis. This stage can involve filling in gaps where data may

ANALYTICS be missing, eradicating duplicates, and changing the data into a format

Implementing real-time analytics requires generating, capturing, that is easier to work with.

preprocessing, and analyzing visualization and reporting. The task


The pace of real-time analytics poses a unique challenge at this stage —
of preprocessing and analysis can be simplified by using newer
a lot of databases designed for this type of analysis struggle with multi-
technologies that enable joins at scale.
table queries (JOIN operations). To ensure real-time insights aren't

Figure 2: Real-time analytics in a nutshell held back by these constraints, users typically perform a process called
denormalization during preprocessing.

This preprocessing stage must be fast, or it reduces the "real-time"


Data Generation nature of the analytics. Traditional ETL tools like Spark may not work
here due to their slower pace. Often, Spark Streaming or Flink are used
for data pipelines. These tools are like high-speed blenders, capable
of preparing our "data ingredients" much more quickly, keeping
everything fresh. However, they can be a challenge to set up and
Data Capture and Ingestion maintain because of the nature of their complexity.

STEP 4: REAL-TIME DATA ANALYSIS,


VISUALIZATION, AND REPORTING
This is the pivotal stage where the magic of real-time analytics truly
unfolds, and it begins with retrieving the data from our real-time database.
Data Pre-Processing
Analysts querying business intelligence (BI) tools, such as Tableau or
Apache Superset, generate SQL commands on the back end that fetch the
most current data for their real-time dashboards and reports.

This freshly retrieved real-time data might also be sent to other


Visualization Analysis Reporting
applications for a deeper dive. Some of these could be AI-powered
applications, using advanced algorithms to go beyond just analyzing
the data. They can draw out deeper insights, trends, or even predictions.
With real-time analytics, we're not just looking at what's happening
now but also anticipating what could happen next.
Decision making

STEP 5: DECISION-MAKING WITH REAL-TIME


STEP 1: REAL-TIME DATA GENERATION ANALYTICS
Every process in real-time analytics starts with data. This data is This is where the data we've collected, cleaned, and analyzed is finally
generated by multiple sources such as online transactions, social put to use. This could involve adjusting a marketing strategy in response
media interactions, and Internet of Things (IoT) devices. The data can to user behavior, optimizing system performance, or identifying and
be structured or unstructured and often arrives in various formats that responding to potential security threats.
demand different kinds of handling and processing.
Human analysts, using real-time dashboards and reports, can quickly
STEP 2: DATA CAPTURE AND INGESTION adjust strategies based on current data trends. Meanwhile, algorithms
FOR REAL-TIME ANALYTICS can make automated adjustments in real time, responding instantly to
Once data is generated, the next step is data capturing and ingestion. data-driven triggers. Regardless of the decision-maker, the speed and
This process involves gathering the generated data from its various accuracy of real-time analytics make for an efficient and responsive
sources and importing it into the system where it will be analyzed. decision-making process. It's all about reacting promptly and staying
In the context of real-time analytics, data capture and ingestion is a ahead of the curve.
continuous cycle that happens frequently. The captured data doesn't

© DZONE | REFCARD | SEPTEMBER 2023 5 BROUGHT TO YOU IN PARTNERSHIP WITH


REFCARD | GETTING STARTED WITH REAL-TIME ANALY TICS

CONCLUSION
Real-time analytics is becoming more common as new use cases and WRITTEN BY SIDA SHEN,
PRODUCT MARKETING MANAGER, CELERDATA
industry practices emerge. While real-time analytics is more costly
An engineer with backgrounds in building machine
than traditional batch analytics, analysts such as Ventana's Matthew
learning and big data infrastructures, Sida Shen
Aslett expect it to grow from 22% to 50% in the next couple of years serves as CelerData’s leading product expert. Sida
oversees the company’s market research and works
as companies seek new advantages. By using new technologies, you
closely with engineers and developers across the analytics industry to
can minimize the cost of data transformation pipelines and, therefore, tackle challenges related to real-time analytics.
increase the freshness of data.

If you're navigating the space of real-time analytics, you are not alone.
3343 Perimeter Hill Dr, Suite 100
Check out the following resources to learn more about: Nashville, TN 37211
888.678.0399 | 919.678.0300
•  How Airbnb implemented Minerva while reducing the amount of
At DZone, we foster a collaborative environment that empowers developers and
data transformation they perform tech professionals to share knowledge, build skills, and solve problems through
content, code, and community. We thoughtfully — and with intention — challenge
•  How vectorization improves database performance and makes the status quo and value diverse perspectives so that, as one, we can inspire
positive change through technology.
joins at scale possible, reducing the need for data pipelines

•  How to go pipeline-free with your real-time analytics and Copyright © 2023 DZone. All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted, in any form or by means
ditch denormalization and complex, time-consuming data of electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
transformation work

© DZONE | REFCARD | SEPTEMBER 2023 6 BROUGHT TO YOU IN PARTNERSHIP WITH

You might also like