Professional Documents
Culture Documents
Business Intelligence, Analytics, and Data Science: A Managerial Perspective
Business Intelligence, Analytics, and Data Science: A Managerial Perspective
Chapter 5
Predictive Analytics II: Text,
Web, and Social Media
Analytics …
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (1 of 2)
5.1 Describe text mining and understand the need for text
mining
5.2 Differentiate among text analytics, text mining, and data
mining
5.3 Understand the different application areas for text
mining
5.4 Know the process of carrying out a text mining project
5.5 Appreciate the different methods to introduce structure
to text-based data
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (2 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (1 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (2 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (3 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Analytics and Text Mining (1 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Analytics and Text Mining (2 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Concepts (1 of 2)
• 85-90 percent of all corporate data is in some kind of
unstructured form (e.g., text)
• Unstructured corporate data is doubling in size every 18
months
• Tapping into these information sources is not an option, but a
need to stay competitive
• Answer: text mining
– A semi-automated process of extracting knowledge from
unstructured data sources
– a.k.a. text data mining or knowledge discovery in textual
databases
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Versus Text Mining
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Concepts (2 of 2)
• Information extraction
• Topic tracking
• Summarization
• Categorization
• Clustering
• Concept linking
• Question answering
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Terminology (1 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Terminology (2 of 2)
• Term dictionary
• Word frequency
• Part-of-speech tagging
• Morphology
• Term-by-document matrix
– Occurrence matrix
• Singular value decomposition
– Latent semantic indexing
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.1
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Natural Language Processing (NLP) (1 of 4)
• What is “Understanding”?
– Human understands, what about computers?
– Natural language is vague, context driven
– True understanding requires extensive knowledge of
a topic
– Can/will computers ever understand natural
language the same/accurate way we do?
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Natural Language Processing (NLP) (3 of 4)
• Challenges in NLP
– Part-of-speech tagging
– Text segmentation
– Word sense disambiguation
– Syntax ambiguity
– Imperfect or irregular input
– Speech acts
• Dream of AI community
– to have algorithms that are capable of automatically
reading and obtaining knowledge from text
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Natural Language Processing (NLP) (4 of 4)
• WordNet
– A laboriously hand-coded database of English words,
their definitions, sets of synonyms, and various
semantic relations between synonym sets
– A major resource for NLP
– Need automation to be completed
• Sentiment Analysis
– A technique used to detect favorable and unfavorable
opinions toward specific products and services
– SentiWordNet
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.2 (1 of 2)
AMC Networks Is Using Analytics to Capture New Viewers, Predict
Ratings, and Add Value for Advertisers in a Multichannel World
A Web-Based Dashboard Used by AMC Networks
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
NLP Task Categories
• Question answering
• Automatic summarization
• Natural language generation & understanding
• Machine translation
• Foreign language reading & writing
• Speech recognition
• Text proofing, optical character recognition
• Optical character recognition
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Applications
• Marketing applications
– Enables better CRM
• Security applications
– ECHELON, OASIS
– Deception detection (…)
• Medicine and biology
– Literature-based gene identification (…)
• Academic applications
– Research stream analysis
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.3 (1 of 4)
• Deception detection
– A difficult problem
– If detection is limited to only text, then the problem is
even more difficult
• The study
– analyzed text-based testimonies of person of interests
at military bases
– used only text-based features (cues)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.3 (2 of 4)
Statements Labeled as
Cues Extracted &
Truthful or Deceptive
Selected
by Law Enforcement
Text Processing
Software Generated
Quantified Cues
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.3 (3 of 4)
• Table 5.1 Categories and Examples of Linguistic Features
Used in Deception Detection
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.3 (4 of 4)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Applications (Gene/Protein
Interaction Identification)
Protein
Gene/
D007962
D 016923
Ontology
D 001773
...expression of Bcl-2 is correlated with insufficient white blood cell death and activation of p53.
Word
185 8 51112 9 23017 27 5874 2791 8952 1623 5632 17 8252 8 2523
POS
NN IN NN IN VBZ IN JJ JJ NN NN NN CC NN IN NN
Shallow
Parse
NP PP NP NP PP NP NP PP NP
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.4
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Process (1 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Process (2 of 7)
Knowledge
Feedback Feedback
The inputs to the process The output of Task 1 is a The output of Task 2 is a flat The output of Task 3 is a
include a variety of relevant collection of documents in file called term-document number of problem-specific
unstructured (and semi- some digitized format for matrix where the cells are classification, association,
structured) data sources such as computer processing populated with the term clustering models and
text, XML, HTML, etc. frequencies visualizations
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Process (3 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Process (4 of 7)
Document 2 1
Document 3 3 1
Document 4 1
Document 5 2 1
Document 6 1 1
...
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Process (5 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Process (6 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Text Mining Process (7 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.5 (1 of 5)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.5 (2 of 5)
Journal Year Author(s) Title Vol\No Pages Keywords Abstract
MISQ 2005 A. Malhotra, Absorptive 29/1 145-197 knowledge The need for continual
S. Gosain capacity management value innovation is driving
and O.A. El configurations in supply chain supply chains to evolve
sawy supply chains: absorptive from a pure transactional
Gearing for capacity focus to leveraging
partner-enabled interorganization interorganizational partner
market al information ships for sharing
knowledge systems
creation configuration
approaches
ISR 1999 D. Robey Accounting for 2-Oct 167-185 organizational Although much
and M.C the contradictory transformation contemporary thought
Boudreau organizational impacts of considers advanced
consequences of technology information technologies
information organization as either determinants or
technology theory research enablers of radical
Theoretical methodology organizational change,
directions and intraorganization empirical studies have
methodological al power revealed inconsistent
Implications electronic findings to support the
communication deterministic logic implicit
mis in such arguments. This
implementation paper reviews the
culture system contradictory
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.5 (3 of 5)
Journal Year Author(s) Title Vol\No Pages Keywords Abstract
JMIS 2001 R. Aron and Achieving the 18/2 65-88 information products When producers of
E.K. Clemons optimal balance internet advertising goods (or services)
between product positioning are confronted by a
investment in signaling signaling situation in which their
quality and games offerings no longer
investment in perfectly match
self-promotion for consumer
information preferences, they
products must determine the
extend to which the
advertised features of
… … … … … … Blank Blank
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.5 (4 of 5)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 5.5 (5 of 5)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Sentiment Analysis
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Sentiment Analysis Applications
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Sentiment Analysis Process (1 of 3)
Textual Data
A statement
Step 1
Calculate the
O – S Polarity
Lexicon
Is there a
No sentiment? Yes
O–S
polarity
Yes measure
Step 2
Calculate the N – P N-P Polarity
Polarity of the
sentiment
Lexicon Record the Polarity,
Strength, and the
Target of the
Step 3 sentiment.
Step 4
Tabulate & aggregate
the sentiment
analysis results
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Sentiment Analysis Process (2 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
P-N Polarity and S-O Polarity
Subjective (S)
S-O Polarity
Objective (O)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Mining Overview
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Mining
Web Mining
Page Rank Information Retrieval Graph Mining Social Analytics Clickstream Analysis
Search Engine Optimization Social Network Analysis Social Media Analytics Weblog Analysis
Marketing Attribution Customer Analytics 360 Customer View Voice of the Customer
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Content/Structure Mining
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Usage Mining (1 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Usage Mining (2 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Structure of a Typical Internet Search
Engine
Metadata
Index
Cashed / Indexed
Documents DB
User Responding Cycle Development Cycle
World Wide Web
ed
tch Pro sse
d
Or Rank a
f M es ce ce s
de Document to Document pro ge
red ed- Lis Pag Pag ssed
Pag es Un eb Pa
es Matcher/Ranker Indexer W
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Anatomy of a Search Engine
1. Development Cycle
– Web Crawler
– Document Indexer
2. Response Cycle
– Query Analyzer
– Document Matcher/Ranker
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Search Engine Optimization
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Top 15 Most Popular Search Engines (by
eBizMBA, August 2016)
Rank Name Estimated Unique Monthly Visitors
1 Google 1,600,000,000
2 Bing 400,000,000
3 Yahoo! Search 300,000,000
4 Ask 245,000,000
5 AOL Search 125,000,000
6 Wow 100,000,000
7 WebCrawler 65,000,000
8 MyWebSearch 60,000,000
9 Infospace 24,000,000
10 Info 13,500,000
11 DuckDuckGo 11,000,000
12 Contenko 10,500,000
13 Dogpile 7,500,000
14 Alhea 4,000,000
15 ixQuick 1,000,000
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Usage Mining (Clickstream Analysis)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Analytics Metrics (1 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Analytics Metrics (2 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Web Analytics Metrics (3 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
A Sample Web Analytics Dashboard
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Social Analytics Social Network
Analysis (1 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Social Analytics Social Network
Analysis (2 of 2)
Discussion Questions
1. How can social media analytics be used in the
consumer products industry?
2. What do you think are the key challenges, potential
solutions, and probable results in applying social media
analytics in consumer products and services firms?
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Social Analytics Social Network Analysis
Metrics
• Connections • Distribution
– Homophily – Bridge
– Multiplexity – Centrality
– Mutuality/reciprocity – Density
– Network closure – Distance
– Propinquity – Structural holes
• Segmentation
– Cliques and social circles
– Clustering coefficient
– Cohesion
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Social Media Definitions and Concepts
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Social Versus Industrial Media
• Web-based social media are different from traditional/industrial
media, such as newspapers, television, and film
• Differentiating characteristics
– Quality
– Reach
– Frequency
– Accessibility
– Usability
– Immediacy
– Updatability
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
How Do People Use Social Media?
Creators
Level of Social Media Engagement
Critics
Joiners
Collectors
Spectators
Inactives
Time
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Social Media Analytics
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Best Practices in Social Media Analytics
• Questions / Comments
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved