Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

2/23/2024

Stream Processing

Class Rules

• You can do anything except:


• Make noises (chatting, singing…)
• Feel free to interrupt me if you have questions .
• According to the university policy,taking attendance is needed.
• Important: you are required to have an 80% attendance to be able to
seat for the final exam.

1
2/23/2024

Course Assessment

 Temporary according to the situation:


 Final exam:50%
 Assignment:20%,individually
 Project:30%,2-3 members per group,report and presentation are
required.
 Important:cheating and plagiarism will get no marks.

A few suggestions….

• Your final grade is based on points – not on an


accumulation of grades.
• You start the class with zero points and earn your
way to your final grade
• If you have an issue or problem, communicate –
send me an email
• If you know you’re not going to meet the deadline
for a quiz or assignment – email me BEFORE the
deadline

2
2/23/2024

BIG DATA, ANALYTICS

Data Deluge

3
2/23/2024

How do I find the relevant data?

7
C o p y r i ght © S A S I n s titu te In c . A ll r igh ts r e s e r v e d .

Big Data Explained

"Big data is what happened when the


cost of storing information became less than
the cost of making the decision
to throw it away.”
- George Dyson
Science Historian and TED Speaker

8
C o p y r i ght © S A S I n s titu te In c . A ll r igh ts r e s e r v e d .

4
2/23/2024

Big Data: What Is It?


The SAS definition of big data:
The point at which the volume, velocity, and variety of data exceed an
organization’s storage or computation capacity for accurate and timely
decision making
Here are some factors associated
with big data:
• data volume
• data velocity
• data variety
• data variability
• data complexity
9
C o p y r i ght © S A S I n s titu te In c . A ll r igh ts r e s e r v e d .

Data Volume

Data volumes are increasing due to use


of the following:
• social media (Facebook, Twitter,
Instagram)
• machines talking to machines
• improvements in the manufacturing
process (quality control)
• automated tracking devices
• streaming data feeds

10

5
2/23/2024

Data Velocity

• business processes that are


more automated
• mergers and acquisitions
• more use of social media
• more use of self-service
applications
• integration of business
applications

11

Data Variety

• structured data
• unstructured data
• business applications
• unstructured text documents
(articles, blogs, and so on)
• emails
• digital images
• video and audio clips
• streaming data
• stock ticker data
• RFID tag data
• sensor data

12

6
2/23/2024

Data Variability

• The flow of data changes over time (seasonality, peak


response, social media trends, and so on).
• Data values change over time. How much history do you
keep?
• Data values are different across data sources.
• Data is stored in different formats.
• Data standards change across time.
• What was “valid” five years ago might not be “valid” today.

13

Data Complexity

• Data comes from a variety of


systems in a variety of
formats. This can make it
difficult to merge, cleanse,
and transform data in a
uniform manner.

14

7
2/23/2024

Evolution to Big Data


• Traditional to Big Data Infrastructure

• Database servers and traditional data processing tools


• Distributed data systems across horizontally coupled,
independent resources to achieve the scalability needed for
the efficient processing of extensive data sets
• Onsite and cloud computing solutions

15

Evolution to Big Data

16

8
2/23/2024

Data Streaming

17

Smart Cities and Homes Connected Customer

Communications Surveillance

Connected Car/ I nternet Building


Management

T
Transportation
OF

hings Agriculture
Energy

Manufacturing
Finance /
Insurance
Retail
Health Care 18
C o p y r i ght © S A S I n s titu te In c . A ll r igh ts r e s e r v e d .

18

9
2/23/2024

Most IoT Data Remains Unused

• Data from sensors in manufacturing can provide information to detect


conditions requiring attention.
• Sensors are pervasive: from wearables to rocket engines.
• Sensor data remains largely untapped (not being used for prediction and
optimization).
• Imagine a structure that would allow sensor data to be processed as it gets
produced.
• Therein lies an opportunity.

19

Traditional Analytics at Rest

Data Data Storage


ETL Deploy
Alerts - Reports
Decisioning

20

10
2/23/2024

Streaming Analytics
Stream – Understand – Act

Data Data Storage


ETL Deploy
Alerts - Reports
Decisioning

Deploy
Enrich

Streaming Data Store


Streaming Model Execution

21

Streaming Data

• The world is getting more instrumented and connected


• Digital data from various hardware (e.g., sensors) or software
flooding in the format of flowing big streams.

• Examples: financial markets, surveillance


• systems, manufacturing, smart cities, …
• Need to collect, process, and analyze big streams to extract
valuable information, discover new insights in real-time, and
detect emerging patterns and outliers

22

11
2/23/2024

Real-time Data Analytics

23

What is Streaming Data

• Streaming data is data that keeps flowing with no discrete beginning or


end.
• Eg. Data from environment sensors, body sensors, surveillance camera,
log files, transactions, …
• Streaming data source emits data records
• continuously rather than in batches.
• Most streaming data sources send data in small sizes (often in kilobytes)
continuously as the data is generated.
• Usually, the data need to be processed on the fly

24

12
2/23/2024

Characteristics of Data Streams

 Unbounded data
• Conceptually infinite, ever-growing set of data items/events
• Practically continuous stream of data, which needs to be
processed/analyzed
 Push model
• The source controls data production and procession
• Publish/subscribe model
 Concept of time
• Often need to reason about when data is produced and when processed
data should be output
• Processing time, ingestion time, event time

25

Data Value Continuum

 Data exists on a time continuum.


 The “things” we do with data are strongly correlated to its age.

 The value of data changes from the individual item to the aggregate over this time line.

26

13
2/23/2024

Data Value Chain

27

Data Streaming

• Traditionally, data is moved in batches.


• Batch processing processes large volumes of batched data with long
latency.
• For many streaming data, batching processing can not be used since it
is either prohibitively large to store and process in batch or the data
can be stale when processed.
• Data streaming (or data stream processing, DSP) is the processing of
streaming data on the fly. (visualizing, summarizing, analytics, …)

28

14
2/23/2024

Benefits of Data Streaming

• Good for time series analysis


• Well-suited for IoT data streams processing
• Can be used for real-time aggregation, correlation, filtering, or
sampling.
• Enable the analysis of data in real time to gain insights into a wide range
of activities.
• May accompany with planned actions based on the results of real-time
analytics.
• Can feedback to improve the effectiveness of future monitoring,
analytics and actions.

29

Patterns that Drives Most Streaming Use Cases

30

15
2/23/2024

Static vs Streaming

• In static data computation, questions are asked of static data.


• In streaming data computation, data are continuously evaluated by
static questions.

31

Batch vs. Real-time Processing

32

16
2/23/2024

Challenges of Streaming

• Streaming data management


• May have only one chance to examine the data
• Arbitrary and interactive exploration
• Real-time analytics
• Recency matter: alerts on recent changes
• Availability

33

Challenges of DSP

• Streaming architecture and pipeline


• Streaming data ingestion and handling (adaptors, data formats, schema,
cleaning, flow control, …)
• Stream processing algorithms design, testing, validation,
deployment, and life-cyclemgnt.
• Scalability on volume and velocity.
• Elastic processing and load variations mgnt.
• Fault tolerance and processing guarantees
• Self-adapt at run-time for pattern shift
• Auto feedback and learning
• Security and privacy

34

17

You might also like