Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Data Science, Machine Learning, BI Explained in a Amazing

Few Pictures
 Posted by Vincent Granville on February 27, 2017 at 2:30pm

 View Blog

http://www.datasciencecentral.com/profiles/blogs/business-intelligence-and-data-science-fuzzy-
borders

Guest blog post by Rubens Zimbres, PhD.

This article brings images from my work modeling with Mathematica, my experience as a Business
Analyst and also my doctorate lessons. For me, the borders between a properly executed Business
Intelligence and Data Science (with substantive knowledge in Management) are fuzzy. See the
picture below:

What is a Data Scientist ? In my understanding, someone can be a data scientist according to his
domain expertise: Business management, physics, computer science, etc.
DATA SCIENCE AND BUSINESS INTELLIGENCE PHASES

1) UNDERSTAND PROCESSES

First of all, really understand the context, processes of the business: familiarity with technology,
employees and daily routine

2) FINANCIAL ANALYSIS

Second, establish business needs (among them, $$$).

- Sales/Revenue

- Net Worth

- Gross margin

- Net profit

- Losses

- Indexes: ROI, ROA, ROE, EBITDA, inventory turnover, liquidity, financial leverage, debt, assets and
liabilities (short term and long term), horizontal and vertical balance analysis
3) DEFINE DATABASE ARCHITECTURE AND METHODOLOGY OF DATA COLLECTION AND
EXTRACTION

Third: a) Define database architecture to provide functionality, reliability, security and ability to
provide valuable data for decision making.

b) establish a methodology of data collection, sampling and market research, sources of data and
KPIs in order to get a reliable data analysis provided with validity.
4) COLLECT DATA

From different sources:

a) Customized market research

b) CRM Database: sales, clients, suppliers and processes

c) Website

d) Online Advertising

e) Employees

f) Big Data

- Facial recognition

- Speech recognition

- Unstructured data

- Structured data

- Images

- Social Media
5) ANALYZE DATA

You can use Excel, R, SAS, Mathematica, SPSS, Pyhton

5.0. Data preparation: work on missing values, outliers (I usually analyze deeply individuals with
values more than 3 standard deviations), normality of data, skewness (the 1/N trick), kurtosis (the
log trick), sampling. Prepare data properly so that you can have a reliable analysis.

5.1. Descriptive statistics:

a) Market Research and Database: quality perception, source of clients, demographics, sales, profit,
repurchase intentions, profitable clients, profitability per sales channel, losses, evolution of KPIs over
time, sales per state/neighborhood, efficiency of employees and sales force, employee performance

b) Social Media: popularity, sentiment analysis, references, associations, conversions, mentions,


influencers. You can use Python for unstructured data analysis (text).

c) Website: visits, paths, time spent, clients' demographics, OS, enter pages, leave pages, contact
forms filled, popularity, page rank

d) Online advertising: bids, keywords, conversion rate, effective contacts, ROI, clients'
demographics, competition strategy

 5.2. Multivariate statistics: correlations , factor analysis, linear regression: identify niches,
causes for profit / loss / sales / satisfaction / quality perception / popularity, most relevant
variables, customer demographics, groups, do market segmentation, sentiment analysis,
guide sales strategy, refine KPI's and customize business offer to clients' needs.

 5.3. Classification algorithms in predictive analysis (naive bayes, random forest, linear and
logistic regression and K nearest neighbors): identify niches, causes for sentiment analysis,
do market segmentation, customize business offer, define marketing mix, identify purchase
patterns, guide sales team, identify social groups and predict future business outcomes.

 K Nearest Neighbors


 5.4. Optimization algorithms (linear and non-linear programming, genetic algorithms and
neural networks): identify most efficient and profitable marketing mix, consider seasonality
of demand and improvements in processes, enhance internal processes, optimize sales
strategy, R&D efforts.

 5.5. Clustering algorithms: K means and hierarchical clustering: to identify niches,


customize business offering, identify social traits and guide sales team.
 5.6. Semantic understanding of the context, between data and customer actions,
interactions, social networks dynamics. This is obtained through analysis of all sources of
data mentioned above. Graphic visualizations and simulations help a lot to understand the
dynamics of a group of people. Below you can see my Mathematica models. Read the full
post on social networks here:

 https://www.linkedin.com/pulse/contagion-social-network-rubens-zimbres

And here: https://www.linkedin.com/pulse/social-network-analysis-based-


callse...
6) DEVELOP SIMULATION MODELS

 Simulation (Markov chains, cellular automata and agent-based modeling): simulate


dynamically market conditions and customer behavior to predict future outcomes, semantic
(graphic) understanding of customers social networks, online behavior, interactions,
patterns of purchase and evolution of opinions over time and interactions. The image below
shows a cellular automata model evolving over time. Each color is a different cell state.

 Machine Learning: supervised (to establish a training set based on data from the past and
predict future outcomes, like purchase intentions, sales and face recognition and sentiment
analysis based on images) and unsupervised (to simulate customers interaction and
emergence of complex consumption patterns). Read the full post on Facial Recognition here:

 https://www.linkedin.com/pulse/facial-recognition-wolfram-mathemati...


 Validity: Data and simulation models must be analysed regarding their validity: nomological,
internal and external validity, content and construct validity, its ergodicity and
homoscedasticity.
7) MAKE STRATEGIC DECISIONS TO GET SUSTAINABLE COMPETITIVE ADVANTAGE

 Value

 Service / Product quality

 Cost

 Price

 Payment methods

 Profit

 Sales strategy

 Marketing mix allocation

 Scope of business

 Economic variables

 Advertising efforts

 Innovation and R&D


 Efficiency

 Employee performance and training

 Resource allocation priorities (advertising, short term liabilities, long term liabilities, salaries,
investments, etc)

 Online strategy

 Search Engine Optimization

 Enhancement of processes

 Brand repositioning

 Client enlightenment regarding his/her role in the business process

 Enhancement in physical and online structure and strategy

 Further enhancements in data collection strategy and KPIs

8) REPORTING AND GETTING FEEDBACK OF EMPLOYEES, CUSTOMERS AND DATA

P.S. One of the restrictions I have with Big Data is that 80% is unstructured data. Any good academic
researcher in Management field knows there is not a well stablished theory in academic literature to
proper measure unstructured data, even with content analysis in qualitative research. It would take
more than 5 years to have a reliable way and a stablished theory to analyze unstructured data,
because academic literature lacks consensus regarding measurement and analysis.

Cognitive bias always exist and it's unavoidable. Even worse if it's an automated algorithm. So, if we
take this epistemological critic into account, the foundations of Big Data admiration will be shaken. It
will take some time to properly analyze such amount of data. Second, in order to analyze Big Data,
one has to be very skilled in analyzing ordinary data, in order to have valuable insights because what
really matters is quality, and not volume of data. What matters is not complexity of data, what
matters is complexity of the data analyst mind. We can make miracles with small amounts of data,
properly analyzed.

Probably the biggest difference between Data Science and Business Intelligence is Machine Learning;
Views: 20014

Like

19 members like this

Share

You might also like