Professional Documents
Culture Documents
Topic-3: E-Commerce and Big Data
Topic-3: E-Commerce and Big Data
DINESH MOHAN
mohandineshnew@gmail.com
mohandineshnew@gmail.com ECD 03 1
E-COMMERCE
mohandineshnew@gmail.com ECD 03 2
E-COMMERCE - CUSTOMER TOUCHPOINTS
mohandineshnew@gmail.com ECD 03 3
E-COMMERCE - VALUE-DRIVERS
sale
Logistics
Inbound Inbound
mohandineshnew@gmail.com ECD 03 4
BUSINESS INTELLIGENCE IN E-COMMERCE
TRADITIONAL BI – PREDICTIVE
MODELLING & DATA MINING
AWARENESS CONSIDERATION SALE FULFILMENT RETENTION ADVOCACY
mohandineshnew@gmail.com ECD 03 5
BIG DATA
TEXT
DATA WEB
MINING &
MINING & MINING &
SENTIMENT
ANALYTICS ANALYTICS
ANALYSIS
PREDICTIVE MODELLING
PREDICTIVE MODELLING
BIG DATA
mohandineshnew@gmail.com ECD 03 6
BIG DATA
• Big Data has become a popular term to describe the exponential growth,
availability, and use of information, both structured and unstructured
• The sources of data that were earlier ignored because of technical limitations
are now treated as gold mines
• Data may come from:
• Web logs
• RFID
• GPS systems
• Sensors
• Social networks
• Internet based documents
• Internet search indexes
• Detail call records
• Photography archives
• Large scale e-commerce practices….
mohandineshnew@gmail.com ECD 03 7
BIG DATA
• Data size is getting bigger:
• Hadron Collider - 1 PB/sec
• Boeing jet - 20 TB/hr
• Facebook - 500 TB/day.
• YouTube – 1 TB/4 min.
• The proposed Square Kilometer
Array telescope (the world’s
proposed biggest telescope)
– 1 EB/day
mohandineshnew@gmail.com ECD 03 8
BIG DATA IN PERSPECTIVE
mohandineshnew@gmail.com ECD 03 9
V’s THAT DEFINE BIG DATA
• Big Data is more than just “big”
VOLUME
• The Vs that define Big Data
SCALE OF DATA
• Volume
• Variety
• Velocity
• Veracity
VARIETY
DIFFERENT
FORMS OF DATA
mohandineshnew@gmail.com ECD 03 10
4 V’s OF BIG DATA
• VOLUME
• Transaction data stored thru’ years
• Text data constantly streaming in from social media
• Sensor data being collected continuously thru’ RFID, GPS etc.
• Decreasing storage costs – technical and financial storage issues are irrelevant
• Key issues
• How to determine relevance amidst large volumes of data
• How to create value from it
• VARIETY
• 85% of data is semi-structured or un-structured
• Comes in all types of formats
• Traditional databases and hierarchical data stores
• OLAP systems
• Text documents/ e-mail/ XML/ audio/ video
• Sensor captured…
• Valuable, must be included in analysis for supporting decisions
mohandineshnew@gmail.com ECD 03 11
4 V’s OF BIG DATA
• VELOCITY
• How fast data is being produced
• How fast data must be processed
(captured, stored and analyzed)
DATA
• Need to deal with torrents of data in real- VALUE
time
• ‘Data-stream’ analytics can be as valuable
TIME
as ‘data-at-rest’
• Need to react in quick time
• VERACITY
• Conformity to facts
• Accuracy
• Quality
• Truthfulness
• Trustworthiness
• Tools and techniques are used to handle veracity by transforming data into
quality and trustworthy insights
mohandineshnew@gmail.com ECD 03 12
BIG DATA - VALUE PROPOSITION
mohandineshnew@gmail.com ECD 03 13
CONCEPTUAL ARCHITECTURE FOR BIG DATA
SOLUTIONS
mohandineshnew@gmail.com ECD 03 14
BIG DATA ANALYTICS
mohandineshnew@gmail.com ECD 03 15
BIG DATA CONSIDERATIONS
• You can’t process the amount of data that you want to because of the
limitations of your current platform
• You can’t include new/contemporary data sources (e.g., social media, RFID,
Sensory, Web, GPS, textual data) because it does not comply with the data
schema
• You need to (or want to) integrate data as quickly as possible to be current
on your analysis
• You want to work with a schema-on-demand data storage paradigm
because the variety of data types
• The data is arriving so fast at your organization’s doorstep that your
analytics platform cannot handle it.
mohandineshnew@gmail.com ECD 03 16
BIG DATA ANALYTICS – CRITICAL SUCCESS FACTORS
mohandineshnew@gmail.com ECD 03 17
BIG DATA ANALYTICS
BIG DATA
ANALYTICS
PREDICTIVE ANALYTICS
PREDICTIVE
ANALYTICS
mohandineshnew@gmail.com ECD 03 18
BIG DATA ANALYTICS – CHALLENGES
• Data volume
• The ability to capture, store, and process the huge volume of data in a
timely manner
• Data integration
• The ability to combine data quickly/cost effectively
• Processing capabilities
• The ability to process the data quickly, as it is captured (i.e., stream
analytics)
• Data governance
• Security, privacy, access
• Skill availability
• Data scientists
• Solution cost
• ROI
mohandineshnew@gmail.com ECD 03 19
BIG DATA ANALYTICS – ENABLERS
• In-memory analytics
• Storing and processing the complete data set in RAM
• In-database analytics
• Placing analytic procedures close to where data is stored
• Appliances
• Combining hardware, software and storage in a single unit for performance and
scalability
mohandineshnew@gmail.com ECD 03 20
BIG DATA ANALYTICS – INTEREST FACTORS
mohandineshnew@gmail.com ECD 03 21
BIG DATA ANALYTICS – BUSINESS APPLICATION
mohandineshnew@gmail.com ECD 03 22
BIG DATA TECHNOLOGIES
• MapReduce …
• Hadoop …
• Hive
• Pig
• Hbase
• Flume
• Oozie
• Ambari
• Avro
• Mahout, Sqoop, Hcatalog, …
mohandineshnew@gmail.com ECD 03 23
BIG DATA TECHNOLOGIES – MAP REDUCE
mohandineshnew@gmail.com ECD 03 24
BIG DATA TECHNOLOGIES – MAP REDUCE
CLUSTERING
CLASSIFICATION 3
mohandineshnew@gmail.com ECD 03 25
BIG DATA TECHNOLOGIES – HADOOP
mohandineshnew@gmail.com ECD 03 26
HADOOP
mohandineshnew@gmail.com ECD 03 27
HADOOP – TECHNICAL COMPONENTS
mohandineshnew@gmail.com ECD 03 28
DATA SCIENTIST
mohandineshnew@gmail.com ECD 03 29
BIG DATA ANALYTICS IN POLITICS
mohandineshnew@gmail.com ECD 03 30
BIG DATA AND DATA WAREHOUSING
• Hadoop
• Hadoop as the repository and refinery - for storing and archiving multi-
structured data
• Hadoop as the active archive - for filtering, transforming, and/or
consolidating multi-structured data
• Data Warehousing
• DW as Information Integrator - data that provides business value
• DW as data server – serves data to interactive BI Presentation tools
mohandineshnew@gmail.com ECD 03 31
BIG DATA AND DATA WAREHOUSING
mohandineshnew@gmail.com ECD 03 32
BIG DATA VENDORS
• Cloudera - cloudera.com
• MapR – mapr.com
• Hortonworks - hortonworks.com
• IBM - Netezza, InfoSphere
• Oracle - Exadata, Exalogic
• Microsoft, Amazon, Google, …
mohandineshnew@gmail.com ECD 03 33
TOP 10 BIG DATA VENDORS (HADOOP)
mohandineshnew@gmail.com ECD 03 34
• End of session.
mohandineshnew@gmail.com ECD 03 35