Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

E-COMMERCE DATA

TOPIC- 3: E-COMMERCE AND BIG DATA

DINESH MOHAN
mohandineshnew@gmail.com

mohandineshnew@gmail.com ECD 03 1
E-COMMERCE

AWARENESS CONSIDERATION SALE FULFILMENT RETENTION ADVOCACY

IMPRESSIONS INBOUND TRAFFIC CONVERSION SUPPLY CUSTOMER CUSTOMER


CHAIN LIFETIME SENTIMENT
VALUE

mohandineshnew@gmail.com ECD 03 2
E-COMMERCE - CUSTOMER TOUCHPOINTS

mohandineshnew@gmail.com ECD 03 3
E-COMMERCE - VALUE-DRIVERS

sale

Logistics

Inbound Inbound

mohandineshnew@gmail.com ECD 03 4
BUSINESS INTELLIGENCE IN E-COMMERCE
TRADITIONAL BI – PREDICTIVE
MODELLING & DATA MINING
AWARENESS CONSIDERATION SALE FULFILMENT RETENTION ADVOCACY

TEXT DATA, WEB NAVIGATION DEFINE CUSTOMER PREFERENCES

mohandineshnew@gmail.com ECD 03 5
BIG DATA

TEXT
DATA WEB
MINING &
MINING & MINING &
SENTIMENT
ANALYTICS ANALYTICS
ANALYSIS

PREDICTIVE MODELLING

PREDICTIVE MODELLING
BIG DATA

mohandineshnew@gmail.com ECD 03 6
BIG DATA
• Big Data has become a popular term to describe the exponential growth,
availability, and use of information, both structured and unstructured
• The sources of data that were earlier ignored because of technical limitations
are now treated as gold mines
• Data may come from:
• Web logs
• RFID
• GPS systems
• Sensors
• Social networks
• Internet based documents
• Internet search indexes
• Detail call records
• Photography archives
• Large scale e-commerce practices….

mohandineshnew@gmail.com ECD 03 7
BIG DATA
• Data size is getting bigger:
• Hadron Collider - 1 PB/sec
• Boeing jet - 20 TB/hr
• Facebook - 500 TB/day.
• YouTube – 1 TB/4 min.
• The proposed Square Kilometer
Array telescope (the world’s
proposed biggest telescope)
– 1 EB/day

mohandineshnew@gmail.com ECD 03 8
BIG DATA IN PERSPECTIVE

mohandineshnew@gmail.com ECD 03 9
V’s THAT DEFINE BIG DATA
• Big Data is more than just “big”
VOLUME
• The Vs that define Big Data
SCALE OF DATA
• Volume
• Variety
• Velocity
• Veracity

VERACITY BIG VELOCITY


ANALYSIS OF
UNCERTAINTY OF
DATA
DATA STREAMING
DATA

VARIETY
DIFFERENT
FORMS OF DATA

mohandineshnew@gmail.com ECD 03 10
4 V’s OF BIG DATA
• VOLUME
• Transaction data stored thru’ years
• Text data constantly streaming in from social media
• Sensor data being collected continuously thru’ RFID, GPS etc.
• Decreasing storage costs – technical and financial storage issues are irrelevant
• Key issues
• How to determine relevance amidst large volumes of data
• How to create value from it

• VARIETY
• 85% of data is semi-structured or un-structured
• Comes in all types of formats
• Traditional databases and hierarchical data stores
• OLAP systems
• Text documents/ e-mail/ XML/ audio/ video
• Sensor captured…
• Valuable, must be included in analysis for supporting decisions

mohandineshnew@gmail.com ECD 03 11
4 V’s OF BIG DATA
• VELOCITY
• How fast data is being produced
• How fast data must be processed
(captured, stored and analyzed)
DATA
• Need to deal with torrents of data in real- VALUE
time
• ‘Data-stream’ analytics can be as valuable
TIME
as ‘data-at-rest’
• Need to react in quick time

• VERACITY
• Conformity to facts
• Accuracy
• Quality
• Truthfulness
• Trustworthiness
• Tools and techniques are used to handle veracity by transforming data into
quality and trustworthy insights
mohandineshnew@gmail.com ECD 03 12
BIG DATA - VALUE PROPOSITION

• Big Data has a greater potential to contain


• More patterns - discover more business opportunities
• More interesting anomalies - uncover more business risks

mohandineshnew@gmail.com ECD 03 13
CONCEPTUAL ARCHITECTURE FOR BIG DATA
SOLUTIONS

mohandineshnew@gmail.com ECD 03 14
BIG DATA ANALYTICS

• Big Data by itself, regardless of the size, type, or speed, is worthless


• Big Data + “big” analytics = value
• With the value proposition, Big Data also brought about big challenges
• Effectively and efficiently capturing, storing, and analyzing Big Data
• New breed of technologies

mohandineshnew@gmail.com ECD 03 15
BIG DATA CONSIDERATIONS

• You can’t process the amount of data that you want to because of the
limitations of your current platform
• You can’t include new/contemporary data sources (e.g., social media, RFID,
Sensory, Web, GPS, textual data) because it does not comply with the data
schema
• You need to (or want to) integrate data as quickly as possible to be current
on your analysis
• You want to work with a schema-on-demand data storage paradigm
because the variety of data types
• The data is arriving so fast at your organization’s doorstep that your
analytics platform cannot handle it.

mohandineshnew@gmail.com ECD 03 16
BIG DATA ANALYTICS – CRITICAL SUCCESS FACTORS

mohandineshnew@gmail.com ECD 03 17
BIG DATA ANALYTICS

BIG DATA
ANALYTICS

PREDICTIVE ANALYTICS

PREDICTIVE
ANALYTICS

TEXT MINING &


SENTIMENT
DATA MINING NATURAL WEB MINING
LANGUAGE
ANALYSIS
PROCESSING

mohandineshnew@gmail.com ECD 03 18
BIG DATA ANALYTICS – CHALLENGES

• Data volume
• The ability to capture, store, and process the huge volume of data in a
timely manner
• Data integration
• The ability to combine data quickly/cost effectively
• Processing capabilities
• The ability to process the data quickly, as it is captured (i.e., stream
analytics)
• Data governance
• Security, privacy, access
• Skill availability
• Data scientists
• Solution cost
• ROI

mohandineshnew@gmail.com ECD 03 19
BIG DATA ANALYTICS – ENABLERS

• In-memory analytics
• Storing and processing the complete data set in RAM

• In-database analytics
• Placing analytic procedures close to where data is stored

• Grid computing & MPP


• Use of many machines and processors in parallel (MPP- massively parallel
processing)

• Appliances
• Combining hardware, software and storage in a single unit for performance and
scalability

mohandineshnew@gmail.com ECD 03 20
BIG DATA ANALYTICS – INTEREST FACTORS

mohandineshnew@gmail.com ECD 03 21
BIG DATA ANALYTICS – BUSINESS APPLICATION

• Process efficiency and cost reduction


• Brand management
• Revenue maximization, cross-selling/up-selling
• Enhanced customer experience
• Churn identification, customer recruiting
• Improved customer service
• Identifying new products and market opportunities
• Risk management
• Regulatory compliance
• Enhanced security capabilities

mohandineshnew@gmail.com ECD 03 22
BIG DATA TECHNOLOGIES

• MapReduce …
• Hadoop …
• Hive
• Pig
• Hbase
• Flume
• Oozie
• Ambari
• Avro
• Mahout, Sqoop, Hcatalog, …

mohandineshnew@gmail.com ECD 03 23
BIG DATA TECHNOLOGIES – MAP REDUCE

• MapReduce distributes the processing of very large multi-structured data


files across a large cluster of ordinary machines/processors
• Goal - achieving high performance with “simple” computers
• Developed and popularized by Google
• Good at processing and analyzing large volumes of multi-structured data in
a timely manner
• Example tasks:
• indexing the web for search, graph analysis, text analysis, machine
learning, …

mohandineshnew@gmail.com ECD 03 24
BIG DATA TECHNOLOGIES – MAP REDUCE

• How does MapReduce work?

CLUSTERING

CLASSIFICATION 3

BATCH-BASED COLLABORATIVE FILTERING

Raw Data Map Function Reduce Function

mohandineshnew@gmail.com ECD 03 25
BIG DATA TECHNOLOGIES – HADOOP

• Hadoop is an open source framework for storing and analyzing massive


amounts of distributed, unstructured data
• Originally created by Doug Cutting at Yahoo!
• Hadoop clusters run on inexpensive commodity hardware so projects can
scale-out inexpensively
• Hadoop is now part of Apache Software Foundation
• Open source - hundreds of contributors continuously improve the core
technology
• MapReduce + Hadoop = Big Data core technology

mohandineshnew@gmail.com ECD 03 26
HADOOP

• How does Hadoop work?


• Access unstructured and semi-structured data (e.g., log files, social media
feeds, other data sources)
• Break the data up into “parts,” which are then loaded into a file system made
up of multiple nodes running on commodity hardware using HDFS
• Each “part” is replicated multiple times and loaded into the file system for
replication and failsafe processing
• Jobs are distributed to the clients, and once completed the results are collected
and aggregated using MapReduce

mohandineshnew@gmail.com ECD 03 27
HADOOP – TECHNICAL COMPONENTS

• Additionally, Hadoop ecosystem is made-up of a number of complementary


sub-projects
• HDFS – default storage layer in Hadoop, supports large-scale, batch-style historical
analysis
• NoSQL non-relational database (Hbase, MongoDB, Cassandra..) – processes and
serves large volume of multi-structured data (relational databases cannot maintain
this level of performance at Big data scale)
• Hive – Hadoop-based data warehousing framework, writes queries in HiveQL, which
are then converted to MapReduce
• MapReduce provides control for analytics, not analytics
• Hadoop is about data diversity, not just data volume
• Hadoop compliments a DW, rarely a replacement
• Hadoop enables many types of analytics, not just web analytics

mohandineshnew@gmail.com ECD 03 28
DATA SCIENTIST

• “The Most Exciting Job of the 21st


Century” - Thomas H. Davenport and
D. J. Patil, Harvard Business Review,
October 2012
• Data Scientist = Big Data guru
• One with skills to investigate Big Data

• Very high salaries, very high


expectations
• Where do Data Scientist come from?
• M.S./Ph.D. in MIS, CS, IE,… and/or
Analytics
• There is no specific degree program
for DS
• PE, PML, … DSP (Data ScIence
Professional)

mohandineshnew@gmail.com ECD 03 29
BIG DATA ANALYTICS IN POLITICS

mohandineshnew@gmail.com ECD 03 30
BIG DATA AND DATA WAREHOUSING

• What is the impact of Big Data on DW?


• Big Data and RDBMS do not go nicely together
• Will Hadoop replace data warehousing/RDBMS?

• Hadoop
• Hadoop as the repository and refinery - for storing and archiving multi-
structured data
• Hadoop as the active archive - for filtering, transforming, and/or
consolidating multi-structured data

• Data Warehousing
• DW as Information Integrator - data that provides business value
• DW as data server – serves data to interactive BI Presentation tools

mohandineshnew@gmail.com ECD 03 31
BIG DATA AND DATA WAREHOUSING

mohandineshnew@gmail.com ECD 03 32
BIG DATA VENDORS

• Cloudera - cloudera.com
• MapR – mapr.com
• Hortonworks - hortonworks.com
• IBM - Netezza, InfoSphere
• Oracle - Exadata, Exalogic
• Microsoft, Amazon, Google, …

mohandineshnew@gmail.com ECD 03 33
TOP 10 BIG DATA VENDORS (HADOOP)

mohandineshnew@gmail.com ECD 03 34
• End of session.

mohandineshnew@gmail.com ECD 03 35

You might also like