Welcome to Scribd!

Bda Unit II Lecture1

Uploaded by

0% found this document useful (0 votes)

22 views10 pages

This document discusses data stream mining and processing techniques. It introduces the stream model where data enters rapidly from external sources and the entire stream cannot be stored. It discusses using limited working storage and archival storage to make calculations about streams using sliding windows. Common applications include mining query streams, click streams, and network packet streams. It provides an example of calculating averages over a sliding window in a stream in constant time per input but requiring the entire window in memory.

Original Description:

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

22 views10 pages

Bda Unit II Lecture1

Uploaded by

Anju2

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 10

Search inside document

UNIT II: Mining Data Streams

The Stream Model

Sliding Windows
Data Management Vs. Stream Management

 In a DBMS, input is under the control of the

programming staff.
 SQL INSERT commands or bulk loaders.
 Stream management is important when the
input rate is controlled externally.
 Example: Google search queries.

2
The Stream Model
 Input tuples enter at a rapid rate, at one or
more input ports.
 The system cannot store the entire stream
accessibly.
 How do you make critical calculations about the
stream using a limited amount of (primary or
secondary) memory?

3
Two Forms of Query
1. Ad-hoc queries: Normal queries asked one
time about streams.
 Example: What is the maximum value seen so
far in stream S?
2. Standing queries: Queries that are, in
principle, asked about the stream at all
times.
 Example: Report each new maximum value ever
seen in stream S.

4
Ad-Hoc
Queries

. . . 1, 5, 2, 7, 0, 9, 3 Standing
Queries
Output
. . . a, r, v, t, y, h, b
Processor
. . . 0, 0, 1, 0, 1, 1, 0
time

Streams Entering

Limited
Working
Storage Archival
Storage

5
Applications
 Mining query streams.
 Google wants to know what queries are more
frequent today than yesterday.
 Mining click streams.
 Yahoo! wants to know which of its pages are getting
an unusual number of hits in the past hour.
 Often caused by annoyed users clicking on a broken page.
 IP packets can be monitored at a switch.
 Gather information for optimal routing.
 Detect denial-of-service attacks.

6
Sliding Windows
 A useful model of stream processing is that
queries are about a window of length N – the N
most recent elements received.
 Alternative: elements received within a time interval
T.
 Interesting case: N is so large it cannot be
stored in main memory.
 Or, there are so many streams that windows for all
do not fit in main memory.

Reference: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

http://www.mmds.org
7
qwertyuiopasdfghjklzxcvbnm

qwertyuiopasdfghjklzxcvbnm

Past Future

8
Example: Averages
 Stream of integers, window of size N.
 Standing query: what is the average of the
integers in the window?
 For the first N inputs, sum and count to get the
average.
 Afterward, when a new input i arrives, change the
average by adding (i - j)/N, where j is the oldest
integer in the window before i arrived.
 Good: O(1) time per input.
 Bad: Requires the entire window in main memory.

9
References
 Jure Leskovec, Anand Rajaraman, Jeff Ullman,
Mining of Massive Datasets, Cambridge
University Press, Second Edition, 2014.
 http://mmds.org/

Unit 4 Notes PDF
Document27 pages
Unit 4 Notes PDF
Bef
100% (2)
20461C 01
Document19 pages
20461C 01
dfrr2000
No ratings yet
Mining Data Streams (Part 1) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
Document46 pages
Mining Data Streams (Part 1) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
陳大明
No ratings yet
Mining Data Streams (Part 1)
Document46 pages
Mining Data Streams (Part 1)
rajianand2
No ratings yet
Mining Data Streams 1
Document46 pages
Mining Data Streams 1
divyanshshrivastav0407
No ratings yet
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
Document46 pages
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
Akhila Shaji
No ratings yet
Mmd04A Streams
Document78 pages
Mmd04A Streams
ammuachunew1234
No ratings yet
Unit-II (Big Data)
Document20 pages
Unit-II (Big Data)
karuntech999
No ratings yet
Unit-II BDA
Document19 pages
Unit-II BDA
karuntech999
No ratings yet
Data Stream's Characteristics: The Following Are The List of Characteristics of DS
Document29 pages
Data Stream's Characteristics: The Following Are The List of Characteristics of DS
Sivapriya P
No ratings yet
Streams 1
Document33 pages
Streams 1
sanjaikumar.mani
No ratings yet
Unit Ii BD
Document74 pages
Unit Ii BD
keerthanavelmurugan02
No ratings yet
Kafka
Document78 pages
Kafka
aryandeo2011
No ratings yet
LadderFilter 7.21SIGMOD2023
Document10 pages
LadderFilter 7.21SIGMOD2023
张浮云
No ratings yet
Ch05a Streams1
Document48 pages
Ch05a Streams1
Ezekiel Loh
No ratings yet
Unit2 Bda
Document293 pages
Unit2 Bda
Bhaviteja Reddy
No ratings yet
Mining Data Streams
Document67 pages
Mining Data Streams
usha
No ratings yet
Viden Io Data Analytics Lecture7 2 Data Sampling PDF
Document15 pages
Viden Io Data Analytics Lecture7 2 Data Sampling PDF
Ram Chandu
No ratings yet
Data Stream Processing - An Overview: Sangeetha Seshadri Sangeeta@cc - Gatech.edu
Document68 pages
Data Stream Processing - An Overview: Sangeetha Seshadri Sangeeta@cc - Gatech.edu
suratsujit
No ratings yet
9 RA MIRI SamplingDS
Document66 pages
9 RA MIRI SamplingDS
Sailaja Srinivasan
No ratings yet
Sampling Presentation JiaTeoh
Document26 pages
Sampling Presentation JiaTeoh
lequocdo
No ratings yet
MapReduce-Final
Document92 pages
MapReduce-Final
rsumaira80
No ratings yet
Marot
Document30 pages
Marot
Salim Umer
No ratings yet
SDA Assignment 3
Document12 pages
SDA Assignment 3
haleemayasin294
No ratings yet
Module II
Document22 pages
Module II
johnsonjoshal5
No ratings yet
Ovs Slides
Document18 pages
Ovs Slides
SaravanaRaajaa
No ratings yet
Infodigicom 0
Document19 pages
Infodigicom 0
Aryan Sampat
No ratings yet
Design Comparison
Document19 pages
Design Comparison
oddlogic
No ratings yet
Par Prog Course Many Core SW Pats Ocl
Document90 pages
Par Prog Course Many Core SW Pats Ocl
Anonymous 6gg4EV9Ai
No ratings yet
Survey 2012
Document11 pages
Survey 2012
Daniel L
No ratings yet
Lec1 Introduction
Document23 pages
Lec1 Introduction
SAI
No ratings yet
Quantum vs. DNA Computing: in Search For New Computing Methods
Document34 pages
Quantum vs. DNA Computing: in Search For New Computing Methods
Anonymous ymvVfZPC
No ratings yet
Cassandra at Ebay: Jay Patel
Document30 pages
Cassandra at Ebay: Jay Patel
asf
No ratings yet
Seguridad Basado en Comportamiento
Document57 pages
Seguridad Basado en Comportamiento
Juan de la Cruz
No ratings yet
Lec 13
Document57 pages
Lec 13
pranavjibhakate
No ratings yet
IA010: Principles of Programming Languages: 9. Concurrency
Document46 pages
IA010: Principles of Programming Languages: 9. Concurrency
LUCKY SINGH
No ratings yet
Roadmap 2024 Genai
Document17 pages
Roadmap 2024 Genai
vinusmiles
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
Document49 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
Shweta Maddhesiya
No ratings yet
19 Deep Learning
Document49 pages
19 Deep Learning
Shame Bope
No ratings yet
ch02 Mapreduce
Document7 pages
ch02 Mapreduce
jefferyleclerc
No ratings yet
Snippext: Semi-Supervised Opinion Mining With Augmented Data
Document12 pages
Snippext: Semi-Supervised Opinion Mining With Augmented Data
rojey79875
No ratings yet
Java Performance Tuning Ver 1
Document72 pages
Java Performance Tuning Ver 1
Manoj Bala
No ratings yet
Computer Science (New) : Class-Xii Code No. 083 2019-20
Document4 pages
Computer Science (New) : Class-Xii Code No. 083 2019-20
Gaurav Bhatt
No ratings yet
Module Iv
Document16 pages
Module Iv
manoj mlp
No ratings yet
Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.
Document3 pages
Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.
siva sai
No ratings yet
Seminar 1015 Twhuang
Document44 pages
Seminar 1015 Twhuang
f f
No ratings yet
Data Parallel Architecture
Document17 pages
Data Parallel Architecture
Sachin Kumar Bassi
No ratings yet
www_scribd
Document2 pages
www_scribd
4D5 SHADIK MADITHETI
No ratings yet
SYLLABUS
Document7 pages
SYLLABUS
wtf
No ratings yet
Some Uses of Hashing in Networking Problems: Michael Mitzenmacher
Document44 pages
Some Uses of Hashing in Networking Problems: Michael Mitzenmacher
Kank Riyan
No ratings yet
WW INFO-02 - Wonderware Historian Best Practices Final
Document72 pages
WW INFO-02 - Wonderware Historian Best Practices Final
Administrador Portal Web
No ratings yet
Chapter 2
Document16 pages
Chapter 2
SANG VÕ NGỌC
No ratings yet
Week 1 A
Document31 pages
Week 1 A
Roshan Raju
No ratings yet
An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture
Document26 pages
An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture
jarpco8484
No ratings yet
Python For Sciences and Engineering
Document89 pages
Python For Sciences and Engineering
Layl Zan
100% (1)
21 p2p
Document64 pages
21 p2p
DUDEKULA VIDYASAGAR
No ratings yet
Congestion Control - Routers: Problems With FIFO/Drop Tail
Document10 pages
Congestion Control - Routers: Problems With FIFO/Drop Tail
ajaz ahmed
No ratings yet
Ca 12
Document64 pages
Ca 12
SX
No ratings yet
Unit 4
Document84 pages
Unit 4
Venkatesh Sharma
No ratings yet
On The Exploration of IO Automata
Document6 pages
On The Exploration of IO Automata
thrw3411
No ratings yet
Node.js, JavaScript, API: Interview Questions and Answers
From Everand
Node.js, JavaScript, API: Interview Questions and Answers
John Edward Cooper Berg
Rating: 5 out of 5 stars
5/5 (1)
COMSOFT 1-11-AMHS-FJ - CADAS - ATS - CenterTerminal - V1.1
Document70 pages
COMSOFT 1-11-AMHS-FJ - CADAS - ATS - CenterTerminal - V1.1
Joseva Vatubulitinibua
No ratings yet
Hybris Create Components Via IMPEX Files
Document11 pages
Hybris Create Components Via IMPEX Files
Alex
100% (1)
Bahria University: Assignment # 5
Document12 pages
Bahria University: Assignment # 5
AqsaGulzar
No ratings yet
Dbms (Lab) Da-1: Table Creation For Airline Database
Document23 pages
Dbms (Lab) Da-1: Table Creation For Airline Database
Sivaram Dasari
No ratings yet
TIBAG Dec PDF
Document48 pages
TIBAG Dec PDF
Angelika Calingasan
No ratings yet
General 80300 TRM
Document312 pages
General 80300 TRM
kumardhaval
No ratings yet
Using An Islamic Question and Answer Knowledge Base To Answer Questions About The Holy Quran
Document11 pages
Using An Islamic Question and Answer Knowledge Base To Answer Questions About The Holy Quran
humayun ikram
No ratings yet
Manual Database Creation in Oracle11g On Windows
Document5 pages
Manual Database Creation in Oracle11g On Windows
vamsikrishna veeraboyina
No ratings yet
PLSQL
Document54 pages
PLSQL
venuoracle9
No ratings yet
Lab Course File: Galgotias University
Document42 pages
Lab Course File: Galgotias University
Yash Maheshwari
No ratings yet
OUMH1103 - Sample - Question 1 PDF
Document8 pages
OUMH1103 - Sample - Question 1 PDF
Azline Ithnin
No ratings yet
Creating DML Triggers - Part 1
Document3 pages
Creating DML Triggers - Part 1
scribdfatih
No ratings yet
Hyperion Consultant 0025
Document7 pages
Hyperion Consultant 0025
Cendien Consulting
No ratings yet
BHA-FPX4106 - Nadhirah Lutchminarain - Assignment 2
Document6 pages
BHA-FPX4106 - Nadhirah Lutchminarain - Assignment 2
Nadhirah De Wet
No ratings yet
Shard Manager
Document17 pages
Shard Manager
tanmay samdani
No ratings yet
PL-300: Microsoft Power BI Data Analyst
Document5 pages
PL-300: Microsoft Power BI Data Analyst
harisomanath
No ratings yet
Principles of Database Exam
Document5 pages
Principles of Database Exam
Gregory Odhiambo
100% (1)
Access Farsi
Document46 pages
Access Farsi
mojtaba79
No ratings yet
Oracle Interface - Oracle EBS R12 End To End Interface Process For Item Import
Document68 pages
Oracle Interface - Oracle EBS R12 End To End Interface Process For Item Import
nguoituyet66
No ratings yet
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
Document28 pages
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
Shiv Shankar Das
No ratings yet
Big Data Analytics Options On AWS
Document50 pages
Big Data Analytics Options On AWS
Rajiv Chodisetti
100% (1)
Msbi - Ssis (2.0)
Document87 pages
Msbi - Ssis (2.0)
Kumar Ganesh
No ratings yet
Brillio (Hackthon 2020) : BY Varsha B
Document15 pages
Brillio (Hackthon 2020) : BY Varsha B
varsha bhi
No ratings yet
Hsa SQL Format
Document5 pages
Hsa SQL Format
Rodel Derla
No ratings yet
SQL Injection Tutorial by Marezzi
Document9 pages
SQL Injection Tutorial by Marezzi
Ravi Shanker
No ratings yet
Spark Notes
Document6 pages
Spark Notes
babjeereddy
No ratings yet
How To Win Coding Competitions: Secrets of Champions: Pavel Krotkov Saint Petersburg 2016
Document32 pages
How To Win Coding Competitions: Secrets of Champions: Pavel Krotkov Saint Petersburg 2016
first
No ratings yet
ER Model
Document40 pages
ER Model
anupam
No ratings yet
Unit 9.3 ER Diagrams
Document13 pages
Unit 9.3 ER Diagrams
M Azeem
No ratings yet