Professional Documents
Culture Documents
Practical Business Analytics Using R and Python 2Nd Edition Umesh R Hodeghatta All Chapter
Practical Business Analytics Using R and Python 2Nd Edition Umesh R Hodeghatta All Chapter
Umesha Nayak
Bangalore, Karnataka, India
Apress Standard
The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been
made.
Dan Koloski
Professor of the Practice and Head of Learning Programs Roux Institute
at Northeastern University
October 2022
Dan Koloski is a professor of the practice in the analytics program
and director of professional studies at the Roux Institute at
Northeastern University.
Professor Koloski joined Northeastern after spending more than 20
years in the IT and software industry, working in both technical and
business management roles in companies large and small. This
included application development, product management and
partnerships, and helping lead a spin-out and sale from a venture-
backed company to Oracle. Most recently, Professor Koloski was vice
president of product management and business development at Oracle,
where he was responsible for worldwide direct and channel go-to-
market activities, partner integrations, product management,
marketing/branding, and mergers and acquisitions for more than $2
billion in product and cloud-services business. Before Oracle, he was
CTO and director of strategy of the web business unit at Empirix, a role
that included product management, marketing, alliances, mergers and
acquisitions, and analyst relations. He also worked as a freelance
consultant and Allaire-certified instructor, developing and deploying
database-driven web applications.
Professor Koloski earned a bachelor’s degree from Yale University
and earned his MBA from Harvard Business School in 2002.
Preface
Business analytics, data science, artificial intelligence (AI), and machine
learning (ML) are hot words right now in the business community.
Artificial intelligence and machine learning systems are enabling
organizations to make informed decisions by optimizing processes,
understanding customer behavior, maximizing customer satisfaction,
and thus accelerating overall top-line growth. AI and machine learning
help organizations by performing tasks efficiently and consistently, thus
improving overall customer satisfaction level.
In financial services, AI models are designed to help manage
customers’ loans, retirement plans, investment strategies, and other
financial decisions. In the automotive industry, AI models can help in
vehicle design, sales and marketing decisions, customer safety features
based on driving patterns of the customer, recommended vehicle type
for the customer, etc. This has helped automotive companies to predict
future manufacturing resources needed to build, for example, electric
and driverless vehicles. AI models also help them in making better
advertisement decisions.
AI can play a big role in customer relationship management (CRM),
too. Machine learning models can predict consumer behavior, start a
virtual agent conversation, and forecast trend analysis that can improve
efficiency and response time.
Recommendation systems (AI systems) can learn users’ content
preferences and can select customers’ choice of music, book, game, or
any items the customer is planning to buy online. Recommendation
systems can reduce return rates and help create better targeted content
management.
Sentiment analysis using machine learning techniques can predict
the opinions and feelings of users of content. This helps companies to
improve their products and services by analyzing the customers’
reviews and feedback.
These are a few sample business applications, but this can be
extended to any business problem provided you have data; for example,
an AI system can be developed for HR functions, manufacturing,
process engineering, IT infrastructure and security, software
development life cycle, and more.
There are several industries that have begun to adopt AI into their
business decision process. Investment in analytics, machine learning,
and artificial intelligence is predicted to triple in 2023, and by 2025, it
is predicted to become a $47 billion market (per International Data
Corp.). According to a recent research survey in the United States,
nearly 34 percent of businesses are currently implementing or plan to
implement AI solutions in their business decisions.
Machine learning refers to the learning algorithm of AI systems to
make decisions based on past data (historical data). Some of the
commonly used machine learning methods include neural networks,
decision trees, k-nearest neighbors, logistic regression, cluster analysis,
association rules, deep neural networks, hidden Markov models, and
natural language processing. Availability and abundance of data, lower
storage and processing costs, and efficient algorithms have made
machine learning and AI a reality in many organizations.
AI will be the biggest disruptor to the industry in the next five years.
This will no doubt have a significant impact on the workforce. Though
many say AI can replace a significant number of jobs, it can actually
enhance productivity and improve the efficiency of workers. AI systems
can help executives make better business decisions and allow
businesses to work on resources and investments to beat the
competition. When decision-makers and business executives make
decisions based on reliable data and recommendations arrived at
through AI systems, they can make better choices for their business,
investments, and employees thus enabling their business to stand out
from competition.
There are currently thousands of jobs posted on job portals in
machine learning, data science, and AI, and it is one of the fastest-
growing technology areas, according to the Kiplinger report of 2017.
Many of these jobs are going unfilled because of a shortage of qualified
engineers. Apple, IBM, Google, Facebook, Microsoft, Walmart, and
Amazon are some of the top companies hiring data scientists in
addition to other companies such as Uber, Flipkart, Citibank, Fidelity
Investments, GE, and many others including manufacturing, healthcare,
agriculture, and transportation companies. Many open job positions are
in San Jose, Boston, New York, London, Hong Kong, and many other
cities. If you have the right skills, then you can be a data scientist in one
of these companies tomorrow!
A data scientist/machine learning engineer may acquire the
following skills:
Communication skills to understand and interpret business
requirements and present the final outcome
Statistics, machine learning, and data mining skills
SQL, NoSQL, and other database knowledge
Knowledge of accessing XML data, connecting to databases, writing
SQL queries, reading JSON, reading unstructured web data, accessing
big data files such as HDFS, NoSQL MongoDB, Cassandra, Redis, Riak,
CouchDB, and Neo4j
Coding skills: Python, R or Java, C++
Tools: Microsoft Azure, IBM Watson, SAS
This book aims to cover the skills required to become a data
scientist. This book enables you to gain sufficient knowledge and skills
to process data and to develop machine learning models. We have made
an attempt to cover the most commonly used learning algorithms and
developing models by using open-source tools such as R and Python.
Practical Business Analytics Using R and Python is organized into five
parts. The first part covers the fundamental principles required to
perform analytics. It starts by defining the sometimes confusing
terminologies that exist in analytics, job skills, tools, and technologies
required for an analytical engineer, before describing the process
necessary to execute AI and analytics projects. The second and
subsequent chapters cover the basics of math, probability theory, and
statistics required for analytics, before delving into SQL, the business
analytics process, exploring data using graphical methods, and an in-
depth discussion of how to evaluate analytics model performance.
In Part II, we introduce supervised machine learning models. We
start with regression analysis and then introduce different
classification algorithms, including naïve Bayes, decision trees, logistic
regression, and neural networks.
Part III discusses time-series models. We cover the most commonly
used models including ARIMA.
Part IV covers unsupervised learning and text mining. In
unsupervised learning, we discuss clustering analysis and association
mining. We end the section by briefly introducing big data analytics.
In the final part, we discuss the open-source tools, R and Python,
and using them in programming for analytics. The focus here is on
developing sufficient programing skills to perform analytics.
Source Code
All the source code used in this book can be downloaded from
https://github.com/apress/practical-business-
analytics-r-python.
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub
(https://github.com/Apress). For more detailed information, please
visit www.apress.com/source-code.
Table of Contents
Part I: Introduction to Analytics
Chapter 1:An Overview of Business Analytics
1.1 Introduction
1.2 Objectives of This Book
1.3 Confusing Terminology
1.4 Drivers for Business Analytics
1.4.1 Growth of Computer Packages and Applications
1.4.2 Feasibility to Consolidate Data from Various Sources
1.4.3 Growth of Infinite Storage and Computing Capability
1.4.4 Survival and Growth in the Highly Competitive World
1.4.5 Business Complexity Growing Out of Globalization
1.4.6 Easy-to-Use Programming Tools and Platforms
1.5 Applications of Business Analytics
1.5.1 Marketing and Sales
1.5.2 Human Resources
1.5.3 Product Design
1.5.4 Service Design
1.5.5 Customer Service and Support Areas
1.6 Skills Required for an Analytics Job
1.7 Process of an Analytics Project
1.8 Chapter Summary
Chapter 2:The Foundations of Business Analytics
2.1 Introduction
2.2 Population and Sample
2.2.1 Population
2.2.2 Sample
2.3 Statistical Parameters of Interest
2.3.1 Mean
2.3.2 Median
2.3.3 Mode
2.3.4 Range
2.3.5 Quantiles
2.3.6 Standard Deviation
2.3.7 Variance
2.3.8 Summary Command in R
2.4 Probability
2.4.1 Rules of Probability
2.4.2 Probability Distributions
2.4.3 Conditional Probability
2.5 Computations on Data Frames
2.6 Scatter Plot
2.7 Chapter Summary
Chapter 3:Structured Query Language Analytics
3.1 Introduction
3.2 Data Used by Us
3.3 Steps for Business Analytics
3.3.1 Initial Exploration and Understanding of the Data
3.3.2 Understanding Incorrect and Missing Data, and
Correcting Such Data
3.3.3 Further Exploration and Reporting on the Data
3.4 Chapter Summary
Chapter 4:Business Analytics Process
4.1 Business Analytics Life Cycle
4.1.1 Phase 1:Understand the Business Problem
4.1.2 Phase 2:Data Collection
4.1.3 Phase 3:Data Preprocessing and Preparation
4.1.4 Phase 4:Explore and Visualize the Data
4.1.5 Phase 5:Choose Modeling Techniques and Algorithms
4.1.6 Phase 6:Evaluate the Model
4.1.7 Phase 7:Report to Management and Review
4.1.8 Phase 8:Deploy the Model
4.2 Chapter Summary
Chapter 5:Exploratory Data Analysis
5.1 Exploring and Visualizing the Data
5.1.1 Tables
5.1.2 Describing Data:Summary Tables
5.1.3 Graphs
5.1.4 Scatter Plot Matrices
5.2 Plotting Categorical Data
5.3 Chapter Summary
Chapter 6:Evaluating Analytics Model Performance
6.1 Introduction
6.2 Regression Model Evaluation
6.2.1 Root-Mean-Square Error
6.2.2 Mean Absolute Percentage Error
6.2.3 Mean Absolute Error (MAE) or Mean Absolute
Deviation (MAD)
6.2.4 Sum of Squared Errors (SSE)
6.2.5 R2 (R-Squared)
6.2.6 Adjusted R2
6.3 Classification Model Evaluation
6.3.1 Classification Error Matrix
6.3.2 Sensitivity Analysis in Classification
6.4 ROC Chart
6.5 Overfitting and Underfitting
6.5.1 Bias and Variance
6.6 Cross-Validation
6.7 Measuring the Performance of Clustering
6.8 Chapter Summary
Part II: Supervised Learning and Predictive Analytics
Chapter 7:Simple Linear Regression
7.1 Introduction
7.2 Correlation
7.2.1 Correlation Coefficient
7.3 Hypothesis Testing
7.4 Simple Linear Regression
7.4.1 Assumptions of Regression
7.4.2 Simple Linear Regression Equation
7.4.3 Creating a Simple Regression Equation in R
7.4.4 Testing the Assumptions of Regression
7.4.5 Conclusion
7.4.6 Predicting the Response Variable
7.4.7 Additional Notes
7.5 Using Python to Generate the Model and Validating the
Assumptions
7.5.1 Load Important Packages and Import the Data
7.5.2 Generate a Simple Linear Regression Model
7.5.3 Alternative Way for Generation of the Model
7.5.4 Validation of the Significance of the Generated Model
7.5.5 Validating the Assumptions of Linear Regression
7.5.6 Predict Using the Model Generated
7.6 Chapter Summary
Chapter 8:Multiple Linear Regression
8.1 Using Multiple Linear Regression
8.1.1 The Data
8.1.2 Correlation
8.1.3 Arriving at the Model
8.1.4 Validation of the Assumptions of Regression
8.1.5 Multicollinearity
8.1.6 Stepwise Multiple Linear Regression
8.1.7 All Subsets Approach to Multiple Linear Regression
8.1.8 Multiple Linear Regression Equation
8.1.9 Conclusion
8.2 Using an Alternative Method in R
8.3 Predicting the Response Variable
8.4 Training and Testing the Model
8.5 Cross Validation
8.6 Using Python to Generate the Model and Validating the
Assumptions
8.6.1 Load the Necessary Packages and Import the Data
8.6.2 Generate Multiple Linear Regression Model
8.6.3 Alternative Way to Generate the Model
8.6.4 Validating the Assumptions of Linear Regression
8.6.5 Predict Using the Model Generated
8.7 Chapter Summary
Chapter 9:Classification
9.1 What Are Classification and Prediction?
9.1.1 K-Nearest Neighbor
9.1.2 KNN Algorithm
9.1.3 KNN Using R
9.1.4 KNN Using Python
9.2 Naïve Bayes Models for Classification
9.2.1 Naïve Bayes Classifier Model Example
9.2.2 Naïve Bayes Classifier Using R (Use Same Data Set as
KNN)
9.2.3 Advantages and Limitations of the Naïve Bayes
Classifier
9.3 Decision Trees
9.3.1 Decision Tree Algorithm
9.3.2 Building a Decision Tree
9.3.3 Classification Rules from Tree
9.4 Advantages and Disadvantages of Decision Trees
9.5 Ensemble Methods and Random Forests
9.6 Decision Tree Model Using R
9.7 Decision Tree Model Using Python
9.7.1 Creating the Decision Tree Model
9.7.2 Making Predictions
9.7.3 Measuring the Accuracy of the Model
9.7.4 Creating a Pruned Tree
9.8 Chapter Summary
Chapter 10:Neural Networks
10.1 What Is an Artificial Neural Network?
10.2 Concept and Structure of Neural Networks
10.2.1 Perceptrons
10.2.2 The Architecture of Neural Networks
10.3 Learning Algorithms
10.3.1 Predicting Attrition Using a Neural Network
10.3.2 Classification and Prediction Using a Neural Network
10.3.3 Training the Model
10.3.4 Backpropagation
10.4 Activation Functions
10.4.1 Linear Function
10.4.2 Sigmoid Activation Function
10.4.3 Tanh Function
10.4.4 ReLU Activation Function
10.4.5 Softmax Activation Function
10.4.6 Selecting an Activation Function
10.5 Practical Example of Predicting Using a Neural Network
10.5.1 Implementing a Neural Network Model Using R
10.6 Implementation of a Neural Network Model Using Python
10.7 Strengths and Weaknesses of Neural Network Models
10.8 Deep Learning and Neural Networks
10.9 Chapter Summary
Chapter 11:Logistic Regression
11.1 Logistic Regression
11.1.1 The Data
11.1.2 Creating the Model
11.1.3 Model Fit Verification
11.1.4 General Words of Caution
11.1.5 Multicollinearity
11.1.6 Dispersion
11.1.7 Conclusion for Logistic Regression
11.2 Training and Testing the Model
11.2.1 Example of Prediction
11.2.2 Validating the Logistic Regression Model on Test Data
11.3 Multinomial Logistic Regression
11.4 Regularization
11.5 Using Python to Generate Logistic Regression
11.5.1 Loading the Required Packages and Importing the
Data
11.5.2 Understanding the Dataframe
11.5.3 Getting the Data Ready for the Generation of the
Logistic Regression Model
11.5.4 Splitting the Data into Training Data and Test Data
11.5.5 Generating the Logistic Regression Model
11.5.6 Predicting the Test Data
11.5.7 Fine-Tuning the Logistic Regression Model
11.5.8 Logistic Regression Model Using the statsmodel()
Library
11.6 Chapter Summary
Part III: Time-Series Models
Chapter 12:Time Series:Forecasting
12.1 Introduction
12.2 Characteristics of Time-Series Data
12.3 Decomposition of a Time Series
12.4 Important Forecasting Models
12.4.1 Exponential Forecasting Models
12.4.2 ARMA and ARIMA Forecasting Models
12.4.3 Assumptions for ARMA and ARIMA
12.5 Forecasting in Python
12.5.1 Loading the Base Packages
12.5.2 Reading the Time-Series Data and Creating a
Dataframe
12.5.3 Trying to Understand the Data in More Detail
12.5.4 Decomposition of the Time Series
12.5.5 Test Whether the Time Series Is “Stationary”
12.5.6 The Process of “Differencing”
12.5.7 Model Generation
12.5.8 ACF and PACF Plots to Check the Model
Hyperparameters and the Residuals
12.5.9 Forecasting
12.6 Chapter Summary
Part IV: Unsupervised Models and Text Mining
Chapter 13:Cluster Analysis
13.1 Overview of Clustering
13.1.1 Distance Measure
13.1.2 Euclidean Distance
13.1.3 Manhattan Distance
13.1.4 Distance Measures for Categorical Variables
13.2 Distance Between Two Clusters
13.3 Types of Clustering
13.3.1 Hierarchical Clustering
13.3.2 Dendrograms
13.3.3 Nonhierarchical Method
13.3.4 K-Means Algorithm
13.3.5 Other Clustering Methods
13.3.6 Evaluating Clustering
13.4 Limitations of Clustering
13.5 Clustering Using R
13.5.1 Hierarchical Clustering Using R
13.6 Clustering Using Python sklearn( )
13.7 Chapter Summary
Chapter 14:Relationship Data Mining
14.1 Introduction
14.2 Metrics to Measure Association:Support, Confidence, and
Lift
14.2.1 Support
14.2.2 Confidence
14.2.3 Lift
14.3 Generating Association Rules
14.4 Association Rule (Market Basket Analysis) Using R
14.5 Association Rule (Market Basket Analysis) Using Python
14.6 Chapter Summary
Chapter 15:Introduction to Natural Language Processing
15.1 Overview
15.2 Applications of NLP
15.2.1 Chatbots
15.2.2 Sentiment Analysis
15.2.3 Machine Translation
15.3 What Is Language?
15.3.1 Phonemes
15.3.2 Lexeme
15.3.3 Morpheme
15.3.4 Syntax
15.3.5 Context
15.4 What Is Natural Language Processing?
15.4.1 Why Is NLP Challenging?
15.5 Approaches to NLP
15.5.1 WordNet Corpus
15.5.2 Brown Corpus
15.5.3 Reuters Corpus
15.5.4 Processing Text Using Regular Expressions
15.6 Important NLP Python Libraries
15.7 Important NLP R Libraries
15.8 NLP Tasks Using Python
15.8.1 Text Normalization
15.8.2 Tokenization
15.8.3 Lemmatization
15.8.4 Stemming
15.8.5 Stop Word Removal
15.8.6 Part-of-Speech Tagging
15.8.7 Probabilistic Language Model
15.8.8 N-gram Language Model
15.9 Representing Words as Vectors
15.9.1 Bag-of-Words Modeling
15.9.2 TF-IDF Vectors
15.9.3 Term Frequency
15.9.4 Inverse Document Frequency
15.9.5 TF-IDF
15.10 Text Classifications
15.11 Word2vec Models
15.12 Text Analytics and NLP
15.13 Deep Learning and NLP
15.14 Case Study:Building a Chatbot
15.15 Chapter Summary
Chapter 16:Big Data Analytics and Future Trends
16.1 Introduction
16.2 Big Data Ecosystem
16.3 Future Trends in Big Data Analytics
16.3.1 Growth of Social Media
16.3.2 Creation of Data Lakes
16.3.3 Visualization Tools at the Hands of Business Users
16.3.4 Prescriptive Analytics
16.3.5 Internet of Things
16.3.6 Artificial Intelligence
16.3.7 Whole Data Processing
16.3.8 Vertical and Horizontal Applications
16.3.9 Real-Time Analytics
16.4 Putting the Analytics in the Hands of Business Users
16.5 Migration of Solutions from One Tool to Another
16.6 Cloud Analytics
16.7 In-Database Analytics
16.8 In-Memory Analytics
16.9 Autonomous Services for Machine Learning
16.10 Addressing Security and Compliance
16.11 Big data Applications
16.12 Chapter Summary
Part V: Business Analytics Tools
Chapter 17:R for Analytics
17.1 Data Analytics Tools
17.2 Data Wrangling and Data Preprocessing Using R
17.2.1 Handling NAs and NULL Values in the Data Set
17.2.2 Apply() Functions in R
17.2.3 lapply()
17.2.4 sapply()
17.3 Removing Duplicate Records in the Data Set
17.4 split()
17.5 Writing Your Own Functions in R
17.6 Chapter Summary
Chapter 18:Python Programming for Analytics
18.1 Introduction
18.2 pandas for Data Analytics
18.2.1 Data Slicing Using pandas
18.2.2 Statistical Data Analysis Using pandas
18.2.3 Pandas Database Functions
18.2.4 Data Preprocessing Using pandas
18.2.5 Handling Data Types
18.2.6 Handling Dates Variables
18.2.7 Feature Engineering
18.2.8 Data Preprocessing Using the apply() Function
18.2.9 Plots Using pandas
18.3 NumPy for Data Analytics
18.3.1 Creating NumPy Arrays with Zeros and Ones
18.3.2 Random Number Generation and Statistical Analysis
18.3.3 Indexing, Slicing, and Iterating
18.3.4 Stacking Two Arrays
18.4 Chapter Summary
References
Index
About the Authors
Dr. Umesh R Hodeghatta
is an engineer, scientist, and an educator.
He is currently a faculty member at
Northeastern University, specializing in
data analytics, AI, machine learning,
deep learning, natural language
processing (NLP), and cybersecurity. He
has more than 25 years of work
experience in technical and senior
management positions at AT&T Bell
Laboratories, Cisco Systems, McAfee, and
Wipro. He was also a faculty member at
Kent State University in Kent, Ohio, and
Xavier Institute of Management in
Bhubaneswar, India. He earned a
master’s degree in electrical and computer engineering (ECE) from
Oklahoma State University and a doctorate degree from the Indian
Institute of Technology (IIT). His research interest is applying
AI/machine learning to strengthen an organization’s information
security based on his expertise in information security and machine
learning. As a chief data scientist, he is helping business leaders to
make informed decisions and recommendations linked to the
organization’s strategy and financial goals, reflecting an awareness of
external dynamics based on a data-driven approach.
He has published many journal articles in international journals and
conference proceedings. In addition, he has authored books titled
Business Analytics Using R: A Practical Approach and The InfoSec
Handbook: An Introduction to Information Security, published by
Springer Apress. Furthermore, Dr. Hodeghatta has contributed his
services to many professional organizations and regulatory bodies. He
was an executive committee member of the IEEE Computer Society
(India); academic advisory member for the Information and Security
Audit Association (ISACA); IT advisor for the government of India;
technical advisory member of the International Neural Network Society
(INNS) India; and advisory member of the Task Force on Business
Intelligence & Knowledge Management. He was listed in “Who’s Who in
the World” for the years 2012, 2013, 2014, 2015, and 2016. He is also a
senior member of the IEEE (USA).
Umesha Nayak
is a director and principal consultant of
MUSA Software Engineering Pvt. Ltd.,
which focuses on
systems/process/management
consulting. He is also the chief executive
officer of N-U Sigma U-Square Analytics
Lab, which specializes in high-end
consulting on artificial intelligence and
machine learning. He has 41 years’
experience, of which 19 years are in
providing consulting to
IT/manufacturing and other
organizations from across the globe. He
has a master’s degree in software
systems and a master’s degree in
economics; he is certified as a CAIIB,
Certified Information Systems Auditor (CISA), and Certified Risk and
Information Systems Control (CRISC) professional from ISACA, PGDFM,
certified lead auditor for many of the ISO standards, and certified coach,
among others. He has worked extensively in banking, software
development, product design and development, project management,
program management, information technology audits, information
application audits, quality assurance, coaching, product reliability,
human resource management and culture development, and
management consultancy, including consultancy in artificial
intelligence and machine learning. He was a vice president and
corporate executive council member at Polaris Software Lab, Chennai,
prior to his current assignment. He has also held various roles such as
head of quality, head of SEPG, and head of Strategic Practice Unit –
Risks & Treasury at Polaris Software Lab. He started his journey with
computers in 1981 with ICL mainframes and continued with minis and
PCs. He was one of the founding members of information systems
auditing in the banking industry in India. He has effectively guided
many organizations through successful ISO 9001/ISO 27001/CMMI and
other certifications and process/product improvements and solved
problems through artificial intelligence and machine learning. He
coauthored the book The InfoSec Handbook: An Introduction to
Information Security, published by Apress.
Part I
Introduction to Analytics
© Umesh R. Hodeghatta, Ph.D and Umesha Nayak 2023
U. R. Hodeghatta, U. Nayak, Practical Business Analytics Using R and Python
https://doi.org/10.1007/978-1-4842-8754-5_1
1.1 Introduction
Today’s world is data-driven and knowledge-based. In the past,
knowledge was gained mostly through observation now, knowledge is
secured not only through observation but also by analyzing data that is
available in abundance. In the 21st century, knowledge is acquired and
applied by analyzing data available through various applications, social
media sites, blogs, and much more. The advancement of computer
systems complements knowledge of statistics, mathematics,
algorithms, and programming. Enormous storage and exten computing
capabilities have ensured that knowledge can be quickly derived from
huge amounts of data and be used for many other purposes. The
following examples demonstrate how seemingly obscure or
unimportant data can be used to make better business decisions:
A hotel in Switzerland welcomes you with your favorite drink and
dish; you are so delighted!
You are offered a stay at a significantly discounted rate at your
favorite hotel on your birthday or marriage anniversary when
traveling to your destination.
Based on your daily activities and/or food habits, you are warned
about the high probability of becoming a diabetic so you can take the
right steps to avoid it.
You enter a grocery store and find that your regular monthly
purchases are already selected and set aside for you. The only
decision you have to make is whether you require all of them or want
to remove some from the list. How happy you are!
There are many such scenarios that are made possible by analyzing
data about you and your activities that is collected through various
means—including mobile phones, your Google searches, visits to
various websites, your comments on social media sites, your activities
using various computer applications, and more. The use of data
analytics in these scenarios has focused on your individual perspective.
Now, let’s look at scenarios from a business perspective.
As a hotel business owner, you are able to provide competitive yet
profitable rates to your prospective customers. At the same time, you
can ensure that your hotel is completely occupied all the time by
providing additional benefits, including discounts on local travel and
local sightseeing offers tied to other local vendors.
As a taxi business owner, you are able to repeatedly attract the same
customers based on their travel history and preferences of taxi type
and driver.
As a fast-food business owner you are able to offer discounted rates
to attract customers on slow days. These discounts enable you to
ensure full occupancy on those days also.
You are in the human resources (HR) department of an organization
and are bogged down by high attrition. But now you are able to
understand the types of people you should focus on recruiting based
on the characteristics of those who perform well and who are more
loyal and committed to the organization.
You are in the business of designing, manufacturing, and selling
medical equipment used by hospitals. You are able to understand the
possibility of equipment failure well before the equipment actually
fails, by carrying out analysis of the errors or warnings captured in
the equipment logs.
All these scenarios are possible by analyzing data that the
businesses and others collect from various sources. There are many
such possible scenarios. The application of data analytics to the field of
business is called business analytics.
You have most likely observed the following scenarios:
You’ve been searching, for the past few days, on Google for
adventurous places to visit. You’ve also tried to find various travel
packages that might be available. You suddenly find that when you
are on Facebook, Twitter, or other websites, they show a specific
advertisement of what you are looking for, usually at a discounted
rate.
You’ve been searching for a specific item to purchase on Amazon (or
any other site). Suddenly, on other sites you visit, you find
advertisements related to what you are looking for or find
customized mail landing in your mailbox, offering discounts along
with other items you might be interested in.
You’ve also seen recommendations that Netflix, Amazon,
Walmart.com, etc., make based on your searches, your wish list, or
previous purchases or movies you have watched. Many times you’ve
also likely observed these sites offering you discounts or promoting
new products based on the huge amount of customer data these
companies have collected.
All of these possibilities are now a reality because of data analytics
specifically used by businesses.
Now let’s discuss each of these drivers for business analytics in more
detail.
Data analytics and data mining techniques use many statistical and
mathematical concepts on which various algorithms, measures, and
computations are based. Good knowledge of statistical and
mathematical concepts is essential to properly use the concepts to
depict, analyze, and present the data and the results of the analysis.
Otherwise, the wrong interpretations, wrong models, and wrong
theories can lead others in the wrong direction by misinterpreting the
results because the application of the technique or interpretation of the
result itself was wrong.
Statistics contribute to a significant aspect of effective data analysis.
Similarly, the knowledge discovery enablers such as machine learning
have contributed significantly to the application of business analytics.
Another area that has given impetus to business analytics is the growth
of database systems, from SQL-oriented ones to NoSQL ones. All these
combined, along with easy data visualization and reporting capabilities,
have led to a clear understanding of what the data tells us and what we
understand from the data. This has led to the vast application of
business and data analytics to solve problems faced by organizations
and to drive a competitive edge in business through the application of
this understanding.
There are umpteen tools available to support each piece of the
business analytics framework. Figure 1-1 presents some of these tools,
along with details of the typical analytics framework.
Figure 1-1 Business analytics framework
Every war has two aspects, the defensive and the offensive, to each of
which there is a corresponding factor of activity. There is something
to gain, the offensive; there is something to lose, the defensive. The
ears of men, especially of the uninstructed, are more readily and
sympathetically open to the demands of the latter. It appeals to the
conservatism which is dominant in the well-to-do, and to the
widespread timidity which hesitates to take any risk for the sake of a
probable though uncertain gain. The sentiment is entirely
respectable in itself, and more than respectable when its power is
exercised against breach of the peace for other than the gravest
motives—for any mere lucre of gain. But its limitations must be
understood. A sound defensive scheme, sustaining the bases of the
national force, is the foundation upon which war rests; but who lays
a foundation without intending a superstructure? The offensive
element in warfare is the superstructure, the end and aim for which
the defensive exists, and apart from which it is to all purposes of war
worse than useless. When war has been accepted as necessary,
success means nothing short of victory; and victory must be sought
by offensive measures, and by them only can be ensured. “Being in,
bear it, that the opposer may be ware of thee.” No mere defensive
attitude or action avails to such end. Whatever the particular mode
of offensive action adopted, whether it be direct military attack, or
the national exhaustion of the opponent by cutting off the sources of
national well-being, whatsoever method may be chosen, offense,
injury, weakening of the foe, to annihilation if need be, must be the
guiding purpose of the belligerent. Success will certainly attend him
who drives his adversary into the position of the defensive and keeps
him there.
Offense therefore dominates, but it does not exclude. The necessity
for defense remains obligatory, though subordinate. The two are
complementary. It is only in the reversal of rôles, by which priority of
importance is assigned to the defensive, that ultimate defeat is
involved. Nor is this all. Though opposed in idea and separable in
method of action, circumstances not infrequently have permitted the
union of the two in a single general plan of campaign, which protects
at the same time that it attacks. “Fitz James’s blade was sword and
shield.” Of this the system of blockades by the British Navy during
the Napoleonic wars was a marked example. Thrust up against the
ports of France, and lining her coasts, they covered—shielded—the
operations of their own commerce and cruisers in every sea; while at
the same time, crossing swords, as it were, with the fleets within,
ever on guard, ready to attack, should the enemy give an opening by
quitting the shelter of his ports, they frustrated his efforts at a
combination of his squadrons by which alone he could hope to
reverse conditions. All this was defensive; but the same operation cut
the sinews of the enemy’s power by depriving him of sea-borne
commerce, and promoted the reduction of his colonies. Both these
were measures of offense; and both, it may be added, were directed
upon the national communications, the sources of national well-
being. The means was one, the effect twofold....
[It is shown that, in the case of insular states, offense and defense
are often closely combined, home security depending on control of
the sea assured by offensive action of the national fleet.—Editor.]
An insular state, which alone can be purely maritime, therefore
contemplates war from a position of antecedent probable superiority
from the twofold concentration of its policy; defense and offense
being closely identified, and energy, if exerted judiciously, being
fixed upon the increase of naval force to the clear subordination of
that more narrowly styled military. The conditions tend to minimize
the division of effort between offensive and defensive, purpose, and,
by greater comparative development of the fleet, to supply a larger
margin of disposable numbers in order to constitute a mobile
superiority at a particular point of the general field. Such a decisive
local superiority at the critical point of action is the chief end of the
military art, alike in tactics and strategy. Hence it is clear that an
insular state, if attentive to the conditions that should dictate its
policy, is inevitably led to possess a superiority in that particular kind
of force, the mobility of which enables it most readily to project its
power to the more distant quarters of the earth, and also to change
its point of application at will with unequalled rapidity.
The general considerations that have been advanced concern all
the great European nations, in so far as they look outside their own
continent, and to maritime expansion, for the extension of national
influence and power; but the effect upon the action of each differs
necessarily according to their several conditions. The problem of sea-
defense, for instance, relates primarily to the protection of the
national commerce everywhere, and specifically as it draws near the
home ports; serious attack upon the coast, or upon the ports
themselves, being a secondary consideration, because little likely to
befall a nation able to extend its power far enough to sea to protect
its merchant ships. From this point of view the position of Germany
is embarrassed at once by the fact that she has, as regards the world
at large, but one coast-line. To and from this all her sea commerce
must go; either passing the English Channel, flanked for three
hundred miles by France on the one side and England on the other,
or else going north about by the Orkneys, a most inconvenient
circuit, and obtaining but imperfect shelter from recourse to this
deflected route. Holland, in her ancient wars with England, when the
two were fairly matched in point of numbers, had dire experience of
this false position, though her navy was little inferior in numbers to
that of her opponent. This is another exemplification of the truth that
distance is a factor equivalent to a certain number of ships. Sea-
defense for Germany, in case of war with France or England, means
established naval predominance at least in the North Sea; nor can it
be considered complete unless extended through the Channel and as
far as Great Britain will have to project hers into the Atlantic. This is
Germany’s initial disadvantage of position, to be overcome only by
adequate superiority of numbers; and it receives little compensation
from the security of her Baltic trade, and the facility for closing that
sea to her enemies. In fact, Great Britain, whose North Sea trade is
but one-fourth of her total, lies to Germany as Ireland does to Great
Britain, flanking both routes to the Atlantic; but the great
development of the British sea-coast, its numerous ports and ample
internal communications, strengthen that element of sea-defense
which consists in abundant access to harbors of refuge.
For the Baltic Powers, which comprise all the maritime States east
of Germany, the commercial drawback of the Orkney route is a little
less than for Hamburg and Bremen, in that the exit from the Baltic is
nearly equidistant from the north and south extremities of England;
nevertheless the excess in distance over the Channel route remains
very considerable. The initial naval disadvantage is in no wise
diminished. For all the communities east of the Straits of Dover it
remains true that in war commerce is paralyzed, and all the resultant
consequences of impaired national strength entailed, unless decisive
control of the North Sea is established. That effected, there is
security for commerce by the northern passage; but this alone is
mere defense. Offense, exerted anywhere on the globe, requires a
surplusage of force, over that required to hold the North Sea,
sufficient to extend and maintain itself west of the British Islands. In
case of war with either of the Channel Powers, this means, as
between the two opponents, that the eastern belligerent has to guard
a long line of communications, and maintain distant positions,
against an antagonist resting on a central position, with interior
lines, able to strike at choice at either wing of the enemy’s extended
front. The relation which the English Channel, with its branch the
Irish Sea, bears to the North Sea and the Atlantic—that of an interior
position—is the same which the Mediterranean bears to the Atlantic
and the Indian Sea; nor is it merely fanciful to trace in the passage
round the north of Scotland an analogy to that by the Cape of Good
Hope. It is a reproduction in miniature. The conditions are similar,
the scale different. What the one is to a war whose scene is the north
of Europe, the other is to operations by European Powers in Eastern
Asia.
To protract such a situation is intolerable to the purse and morale
of the belligerent who has the disadvantage of position. This of
course leads us straight back to the fundamental principles of all
naval war, namely, that defense is ensured only by offense, and that
the one decisive objective of the offensive is the enemy’s organized
force, his battle fleet. Therefore, in the event of a war between one of
the Channel Powers, and one or more of those to the eastward, the
control of the North Sea must be at once decided. For the eastern
State it is a matter of obvious immediate necessity, of commercial
self-preservation. For the western State the offensive motive is
equally imperative; but for Great Britain there is defensive need as
well. Her Empire imposes such a development of naval force as
makes it economically impracticable to maintain an army as large as
those of the Continent. Security against invasion depends therefore
upon the fleet. Postponing more distant interests, she must here
concentrate an indisputable superiority. It is, however, inconceivable
that against any one Power Great Britain should not be able here to
exert from the first a preponderance which would effectually cover
all her remoter possessions. Only an economical decadence, which
would of itself destroy her position among nations, could bring her
so to forego the initial advantage she has, in the fact that for her
offense and defense meet and are fulfilled in one factor, the
command of the sea. History has conclusively demonstrated the
inability of a state with even a single continental frontier to compete
in naval development with one that is insular, although of smaller
population and resources. A coalition of Powers may indeed affect
the balance. As a rule, however, a single state against a coalition
holds the interior position, the concentrated force; and while
calculation should rightly take account of possibilities, it should
beware of permitting imagination too free sway in presenting its
pictures. Were the eastern Powers to combine they might prevent
Great Britain’s use of the North Sea for the safe passage of her
merchant shipping; but even so she would but lose commercially the
whole of a trade, the greater part of which disappears by the mere
fact of war. Invasion is not possible, unless her fleet can be wholly
disabled from appearing in that sea. From her geographical position,
she still holds her gates open to the outer world, which maintains
three fourths of her commerce in peace.
37. Bearing of Political Developments on Naval
Policy and Strategy[115]
The external activities of Europe, noted a dozen years ago and before,
have now to a certain extent been again superseded by rivalries
within Europe itself. Those rivalries, however, are the result of their
previous external activities, and in the last analysis they depend
upon German commercial development. This has stimulated the
German Empire to a prodigious naval programme, which affects the
whole of Europe and may affect the United States. In 1897 I summed
up two conspicuous European conditions as being the equilibrium
then existing between France and Germany, with their respective
allies, and the withdrawal of Great Britain from active association
with the affairs of the Continent. At that date the Triple Alliance,
Austria, Germany, Italy, stood against the Dual Alliance, France and
Russia; Great Britain apart from both, but with elements of
antagonism against Russia and France, and not against the German
monarchies or Italy. These antagonisms arose wholly from
conditions external to Europe,—in India against Russia, and in Africa
against France. Later, the paralysis of Russia, through her defeat by
Japan, and through her internal troubles, left France alone for a
time; during which Germany, thus assured against land attack, was
better able to devote much money to the fleet, as the protector of her
growing commerce. The results have been a projected huge German
navy, and a German altercation with France relative to Moroccan
affairs; incidents which have aroused Great Britain to a sense of
naval danger, and have propelled her to the understandings—
whatever they amount to—with France and Russia, which we now
know as the Triple Entente. In short, Great Britain has abandoned
the isolation of twenty years ago, stands joined to the Dual Alliance,
and it becomes a Triple Entente.
To the United States this means that Great Britain, once our chief
opponent in matters covered by the Monroe Doctrine, but later by
the logic of events drawn to recede from that opposition, so that she
practically backed us against Europe in 1898, and subsequently
conceded the Panama arrangement known as the Hay-Pauncefote
Treaty, cannot at present count for as much as she did in naval
questions throughout the world. It means to the United States and to
Japan that Great Britain has too much at stake at home to side with
the one or the other, granting she so wished, except as bound by
treaty, which implies reciprocal obligations. Between her and Japan
such specific obligations exist. They do not in the case of the United
States; and the question whether the two countries are disposed to
support one another, and, if so, to what extent, or what the attitude
of Great Britain would be in case of difficulty between Japan and the
United States, are questions directly affecting naval strategy.[116]
Great Britain does indeed for the moment hold Germany so far in
check that the German Empire also can do no more than look after
its European interests; but should a naval disaster befall Great
Britain, leaving Germany master of the naval situation, the world
would see again a predominant fleet backed by a predominant army,
and that in the hands, not of a state satiated with colonial
possessions, as Great Britain is, but of one whose late entry into
world conditions leaves her without any such possessions at all of
any great value. The habit of mind is narrow which fails to see that a
navy such as Germany is now building will be efficacious for other
ends than those immediately proposed. The existence of such a fleet
is a constant factor in contemporary politics; the part which it shall
play depending upon circumstances not always to be foreseen.
Although the colonial ambitions of Germany are held in abeyance for
the moment, the wish cannot but exist to expand her territory by
foreign acquisitions, to establish external bases for the support of
commercial or political interests, to build up such kindred
communities as now help to constitute the British Empire, homes for
emigrants, markets for industries, sources of supplies of raw
materials, needed by those industries.
All such conditions and ambitions are incidents with which
Strategy, comprehensively considered, has to deal. By the successive
enunciations of the Monroe Doctrine the United States stands
committed to the position that no particle of American soil shall pass
into the hands of a non-American State other than the present
possessor. No successful war between foreign states, no purchase, no
exchange, no merger, such as the not impossible one of Holland with
Germany, is allowed as valid cause for such transfer. This is a very
large contract; the only guarantee of which is an adequate navy,
however the term “adequate” be defined. Adequacy often depends
not only upon existing balances of power, such, for instance, as that
by which the British and German navies now affect one another,
which for the moment secures the observance of the Doctrine.
Account must be taken also of evident policies which threaten to
disturb such balances, such as the official announcement by
Germany of her purpose to create a “fleet of such strength that, even
for the mightiest naval power, a war with Germany would involve
such risks as to jeopardize its own supremacy.” This means, at least,
that Great Britain hereafter shall not venture, as in 1898, to back the
United States against European interference; nor to support France
in Morocco; nor to carry out as against Germany her alliance with
Japan. It is a matter of very distinct consequence in naval strategy
that Great Britain, after years of contention with the United States,
essentially opposed to the claims of the Monroe Doctrine, should at
last have come to substantial coincidence with the American point of
view, even though she is not committed to a formal announcement to
that effect.[117] Such relations between states are primarily the
concern of the statesman, a matter of international policies; but they
are also among the data which the strategist, naval as well as land,
has to consider, because they are among the elements which
determine the constitution and size of the national fleet.
I here quote with approval a statement of the French Captain
Darrieus:
“Among the complex problems to which the idea of strategy gives
rise there is none more important than that of the constitution of the
fleet; and every project which takes no account of the foreign
relations of a great nation, nor of the material limit fixed by its
resources, rests upon a weak and unstable base.”
I repeat also the quotation from Von der Goltz: “We must have a
national strategy, a national tactics.” I cannot too entirely repudiate
any casual word of mine, reflecting the tone which once was so
traditional in the navy that it might be called professional,—that
“political questions belong rather to the statesman than to the
military man.” I find these words in my old lectures, but I very soon
learned better, from my best military friend, Jomini; and I believe
that no printed book of mine endorses the opinion that external
politics are of no professional concern to military men.
It was in accordance with this changed opinion that in 1895, and
again in 1897, I summed up European conditions as I conceived
them to be; pointing out that the distinguishing feature at that time
was substantial equilibrium on the Continent, constituting what is
called the Balance of Power; and, in connection with the calm thus
resulting, an immense colonizing movement, in which substantially
all the great Powers were concerned. This I indicated as worthy of
the notice of naval strategists, because there were parts of the
American continents which for various reasons might attract upon
themselves this movement, in disregard of the Monroe Doctrine.
Since then the scene has shifted greatly, the distinctive feature of
the change being the growth of Germany in industrial, commercial,
and naval power,—all three; while at the same time maintaining her
military pre-eminence, although that has been somewhat qualified
by the improvement of the French army, just as the growth of the
German navy has qualified British superiority at sea. Coincident with
this German development has been the decline of Russia, owing to
causes generally understood; the stationariness of France in
population, while Germany has increased fifty per cent; and the very
close drawing together of Germany and Austria, for reasons of much
more controlling power than the mere treaty which binds them. The
result is that to-day central Europe, that is, Austria and Germany,
form a substantially united body, extending from water to water,
from North Sea to Adriatic, wielding a military power against which,
on the land, no combination in Europe can stand. The Balance of
Power no longer exists; that is, if my estimate is correct of the
conditions and dispersion which characterize the other nations
relatively to this central mass.
This situation, coinciding with British trade jealousies of the new
German industries, and with the German naval programme, have
forced Great Britain out of the isolation which the Balance of Power
permitted her. Her ententes are an attempt to correct the
disturbance of the balance; but, while they tend in that direction,
they are not adequate to the full result desired. The balance remains
uneven; and consequently European attention is concentrated upon
European conditions, instead of upon the colonizing movements of
twenty years ago. Germany even has formally disavowed such
colonizing ambitions, by the mouth of her ambassador to the United
States, confirmed by her minister of foreign affairs, although a dozen
years ago they were conspicuous. Concerning these colonizing
movements, indeed, it might be said that they have reached a
moment of quiet, of equilibrium, while internally Europe is
essentially disquieted, as various incidents have shown.
The important point to us here is the growing power of the
German Empire, in which the efficiency of the State as an organic
body is so greatly superior to that of Great Britain, and may prove to
be to that of the United States. The two English-speaking countries
have wealth vastly superior, each separately, to that of Germany;
much more if acting together. But in neither is the efficiency of the
Government for handling the resources comparable to that of
Germany; and there is no apparent chance or recognized inducement
for them to work together, as Germany and Austria now work in
Europe. The consequence is that Germany may deal with each in
succession much more effectively than either is now willing to
consider; Europe being powerless to affect the issue so long as
Austria stands by Germany, as she thoroughly understands that she
has every motive to do.
It is this line of reasoning which shows the power of the German
navy to be a matter of prime importance to the United States. The
power to control Germany does not exist in Europe, except in the
British navy; and if social and political conditions in Great Britain
develop as they now promise, the British navy will probably decline
in relative strength, so that it will not venture to withstand the
German on any broad lines of policy, but only in the narrowest sense
of immediate British interests. Even this condition may disappear,
for it seems as if the national life of Great Britain were waning at the
same time that that of Germany is waxing. The truth is, Germany, by
traditions of two centuries, inherits now a system of state control,
not only highly developed but with a people accustomed to it,—a
great element of force; and this at the time when control of the
individual by the community—that is, by the state—is increasingly
the note of the times. Germany has in this matter a large start. Japan
has much the same.
When it is remembered that the United States, like Great Britain
and like Japan, can be approached only by sea, we can scarcely fail to
see that upon the sea primarily must be found our power to secure
our own borders and to sustain our external policy, of which at the
present moment there are two principal elements; namely, the
Monroe Doctrine and the Open Door. Of the Monroe Doctrine
President Taft, in his first message to Congress, has said that it has
advanced sensibly towards general acceptance; and that
maintenance of its positions in the future need cause less anxiety
than it has in the past. Admitting this, and disregarding the fact that
the respect conceded to it by Europe depends in part at least upon
European rivalries modifying European ability to intervene,—a
condition which may change as suddenly as has the power of Russia
within the decade,—it remains obvious that the policy of the Open
Door requires naval power quite as really and little less directly than
the Monroe Doctrine. For the scene of the Open Door contention is
the Pacific; the gateway to the Pacific for the United States is the
Isthmus; the communications to the Isthmus are by way of the Gulf
of Mexico and the Caribbean Sea. The interest of that maritime
region therefore is even greater now than it was when I first
undertook the strategic study of it, over twenty years ago. Its
importance to the Monroe Doctrine and to general commercial
interests remains, even if modified.
At the date of my first attempt to make this study of the Caribbean,
and to formulate certain principles relative to Naval Strategy, there
scarcely could be said to exist any defined public consciousness of
European and American interest in sea power, and in the methods of
its application which form the study of Strategy. The most striking
illustration of this insensibility to the sea was to be found in
Bismarck, who in a constructive sense was the greatest European
statesman of that day. After the war with France and the acquisition
of Alsace and Lorraine, he spoke of Germany as a state satiated with
territorial expansion. In the matter of external policy she had
reached the limits of his ambitions for her; and his mind thenceforth
was set on internal development, which should harmonize the body
politic and ensure Germany the unity and power which he had won
for her. His scheme of external relations did not stretch beyond
Europe. He was then too old to change to different conceptions,
although he did not neglect to follow the demand of the people as
their industry and commerce developed.
The contrast between the condition of indifference to the sea
which he illustrated and that which now exists is striking; and the
German Empire, which owes to him above all men its modern
greatness, offers the most conspicuous illustration of the change. The
new great navies of the world since 1887 are the German, the
Japanese, and the American. Every state in Europe is now awake to
the fact that the immediate coming interests of the world, which are
therefore its own national interest, must be in the other continents.
Europe in its relatively settled conditions offers really the base of
operations for enterprises and decisive events, the scene of which
will be in countries where political or economical backwardness must
give place to advances which will be almost revolutionary in kind.
This can scarcely be accomplished without unsettlements, the
composing of which will depend upon force. Such force by a
European state—with the single exception of Russia, and possibly, in
a less degree, of Austria—can be exerted only through a navy.