Professional Documents
Culture Documents
DZ Refcard 390 Realtime Analytics 2023
DZ Refcard 390 Realtime Analytics 2023
CONTENTS
Real-Time Analytics
• Common Challenges
of Real-Time Analytics
• Getting Started
With Real-Time Analytics
• Conclusion
SIDA SHEN
PRODUCT MARKETING MANAGER, CELERDATA
Real-time analytics is necessary for any business that needs to make as adtech, crypto, or finance. In these industries and many others,
decisions in hours, minutes, or seconds. Implementing real-time competitive advantage is gained by reacting swiftly to market changes.
analytics requires processing high volumes of input data and matching
Meanwhile, fraud and security are a bigger threat than ever before, and
it with existing data in minutes, seconds, or even less time.
real-time analytics is essential to both detecting and preventing fraud
This Refcard aims to acquaint you with real-time analytics, where it is and security penetration. Real-time analytics has become more crucial
used, how it works, and the challenges involved. across industries where supply chains are now constantly changing
due to the dynamic geopolitical and economic environment. Often,
WHAT IS REAL-TIME ANALYTICS? everything from consumer devices to manufacturing equipment are
Real-time analytics involves processing data as soon as it comes into a instrumented with sensors and provide real-time data; analytics must
system. Analytics processes, discovers, and communicates meaningful keep up to supply correct control and telemetry to the people and
insights from data through math, statistics, and machine learning. algorithms that control and monitor these devices.
Traditionally, analytical systems process a large amount of data, often
on the scale of petabytes, and have few users running relatively large Whether it is due to industry-specific requirements or to the changing
queries when compared to transactional or operational systems. This nature of customer expectations and competitive requirements, real-
definition has changed in recent years, and analytical systems are often time analytics has become an integral tool.
user- and customer-facing analytical systems. They are also frequently
used as part of a larger workflow where the "user" is another system or
artificial intelligence solution that makes data-driven decisions.
Learn more
REFCARD | GETTING STARTED WITH REAL-TIME ANALY TICS
ABOUT REAL-TIME ANALYTICS associated logic are what are called real-time data pipelines or data
Real-time analytics shares a lot in common with batch analytics, transformation pipelines.
including the need for data transformation. However, it differs in both
Real-time data pipelines also process data as soon as it arrives, almost
timeframe and implementation.
instantaneously. In order to ensure data can be queried efficiently, data
HOW REAL-TIME PROCESSING DIFFERS FROM pipelines transform the data into a more denormalized form, pre-joining
TRADITIONAL ANALYTICS and often pre-aggregating data. The data lake or data warehouse ends
Batch processing systems operate on an accumulated set of data over up containing a more efficiently queried summarized form of the data
a specified time interval. These systems usually consist of regular load in the operational system. The primary difference between batch and
processes that extract data, transform it often into summary tables, real-time analytics is simply batching.
and load it into a destination system. Often, tools like Apache Spark are
used to process the data before loading it into a destination system, REAL-WORLD EXAMPLES OF REAL-TIME
ANALYTICS USAGE
usually a data warehouse like Teradata or Snowflake.
There can be no exhaustive list of every real-time analytics use case.
Real-time analytics processes data as soon as it arrives, usually First, new innovations are happening every day as the global economy
instantaneously. Real-time systems usually include a message or event increasingly digitizes. Secondly, competition is forcing previously slow-
queue like Apache Kafka or Flink. Data is often sent to the operational moving industries to move faster and provide service on demand. With
system and transformed and loaded into the analytics system either that said, there are some places where real-time analytics is a must or
one after the other or simultaneously. The message or event queue and has a clear advantage:
Table 1
Security • Requires analyzing massive amounts of data. According to Statistica, in 2022, there were more than 1,802 reported data
and threat compromises, which affected more than 422 million people. While there
• To be effective, it must happen in real time.
detection are yearly fluctuations, the trend has been upward virtually since the start
of modern computing. Detecting a breach is useful even after the fact, but
shutting down anomalous activity before there is damage is crucial for
sensitive data and systems.
Fraud and • Real-time analytics often combined with machine learning According to the NICE Actimize 2023 Fraud Insights Report, fraudulent
risk analysis and other algorithmic techniques can detect when transactions have risen 92%year over year, and the amounts are up 146%.
transactions are "abnormal" or unusually risky.
Fraud has been especially challenging to cryptocurrency firms, where major
• Largely done in batches after the fact. incidents have helped convince a majority of the public that crypto investments
are unusually risky, according to CNBC. To combat this, crypto firms are
• Modern payment systems are real-time, so fraud and risk
implementing anti-money-laundering (AML) and know-your-customer (KYC)
detection must be as well.
systems using real-time analytics. The crypto industry requires new database
technology as analyzing blockchains is more intensive than flat transaction logs.
Network • Related to security and threat detection. Real-time analytics can ensure networks are stable. As a result, faulty
telemetry equipment is routed around and replaced, and traffic is moved to appropriate
• Includes other issues such as misconfiguration, faulty
and traffic routes on redundant networks.
equipment, or traffic congestion.
monitoring
• Can be detected and corrected.
• Requires real-time analytics and extreme volume as the
system has to essentially outrun the network at least at
some sample rate.
Online user • Used across industries, especially gaming and e-commerce. By understanding what users do and why, vendors can provide customers
behavior with a better experience and, ultimately, close more deals. Internet giants, like
• Every click, mouseover, and scroll generates data.
tracking Airbnb in the US and TenCent and Alibaba in China, have implemented systems
to help algorithmically understand their users as well as provide information
to professionals.
Supply • A real-time endeavor in recent years. In recent years, supply chains have become more dynamic and, in many cases,
chain and more risky. Gone are the days of permanent contracts like the storied contract
• Before 2020, seeing an empty shelf at a major retailer in
inventory of yesteryear between Ford and Firestone.
a developed country was unheard of. The shelf space is
management
too valuable, and there was never a reason to be out in Now, everyone from brick-and-mortar retail to manufacturing and beyond
economies of abundance. must be aware of their entire inventory and supply chains, and be prepared to
make changes.
• While empty slots are still costly, they are now more common.
SHOULD YOU USE REAL-TIME ANALYTICS? fast. Analytical databases have not traditionally handled updates well,
In some industries and use cases, real-time analytics is a clear-cut but this is increasingly necessary when analyzing in real time.
requirement. If you are not sure, there are four questions to answer:
REAL-TIME PREPROCESSING PIPELINE:
1. How fast is data being generated and at what frequency? THE HIDDEN COST OF RTA
− It's crucial to first assess if your data pipeline can generate Most analytical databases do not handle join operations efficiently at
fresh data at the source. Real-time analytics becomes futile scale. Because of this, it is usually necessary to do some amount of
if the starting point is already outdated data. denormalization, pre-aggregation, and transformation before loading
data into the analytical system. These data pipelines are complex,
2. How quickly can you make decisions based on this data?
difficult to maintain, and make it hard to track data points back to the
− Businesses reap the rewards of real-time analytics when
source. Additionally, the more preprocessing, the less fresh the data.
they can swiftly translate freshly produced data into
actionable insights. Equally crucial to the freshness of data Next, handling updates for pre-joined or pre-aggregated data often
is the rapidity of decision-making. If your business process requires re-processing any pre-aggregated or pre-joined data. Finally,
cannot drive decisions or lead action quickly, then real-time many analytics systems are full of data that will never be queried, but
analytics might not be the best fit for you. that cannot be known in advance. All of these issues make the system
costly to operate and difficult to maintain.
3. Does your business model or strategy benefit from real-time
insights? Figure 1: Real-time data pipeline
− It is widely recognized that fresh data can be extremely
advantageous, but these benefits aren't universal. Rather
than merely focusing on the perks of real-time analytics,
why not flip the perspective? Consider what the upper limit
of data freshness is that your operations can effectively
manage or tolerate.
Streaming Events
4. Can the benefits justify the cost of real-time analytics?
efficiently. Moreover, consider technologies that allow transformation just sit idle; it's immediately put to work. This swift and continuous flow
directly in the database, also called extract, load, transform (ELT) is what allows real-time analytics to provide immediate insights and
instead of the traditional extract, transform, load (ETL). Where possible, drive quick decision-making.
it's often preferable to adopt data lake technologies rather than black-
STEP 3: DATA PREPROCESSING FOR REAL-TIME
box proprietary technology. This allows the system to continuously
ANALYTICS
evolve as new technologies become available.
Preprocessing refers to cleaning and transforming raw data to make it
GETTING STARTED WITH REAL-TIME ready for analysis. This stage can involve filling in gaps where data may
ANALYTICS be missing, eradicating duplicates, and changing the data into a format
Implementing real-time analytics requires generating, capturing, that is easier to work with.
Figure 2: Real-time analytics in a nutshell held back by these constraints, users typically perform a process called
denormalization during preprocessing.
CONCLUSION
Real-time analytics is becoming more common as new use cases and WRITTEN BY SIDA SHEN,
PRODUCT MARKETING MANAGER, CELERDATA
industry practices emerge. While real-time analytics is more costly
An engineer with backgrounds in building machine
than traditional batch analytics, analysts such as Ventana's Matthew
learning and big data infrastructures, Sida Shen
Aslett expect it to grow from 22% to 50% in the next couple of years serves as CelerData’s leading product expert. Sida
oversees the company’s market research and works
as companies seek new advantages. By using new technologies, you
closely with engineers and developers across the analytics industry to
can minimize the cost of data transformation pipelines and, therefore, tackle challenges related to real-time analytics.
increase the freshness of data.
If you're navigating the space of real-time analytics, you are not alone.
3343 Perimeter Hill Dr, Suite 100
Check out the following resources to learn more about: Nashville, TN 37211
888.678.0399 | 919.678.0300
• How Airbnb implemented Minerva while reducing the amount of
At DZone, we foster a collaborative environment that empowers developers and
data transformation they perform tech professionals to share knowledge, build skills, and solve problems through
content, code, and community. We thoughtfully — and with intention — challenge
• How vectorization improves database performance and makes the status quo and value diverse perspectives so that, as one, we can inspire
positive change through technology.
joins at scale possible, reducing the need for data pipelines
• How to go pipeline-free with your real-time analytics and Copyright © 2023 DZone. All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted, in any form or by means
ditch denormalization and complex, time-consuming data of electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
transformation work