Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Datadidak

Introduction to
Data Analytics
Datadidak

What is Data Analytics?


“Analytics is the systematic computational
Posing a Question
analysis of data or statistics.It is used for
Start with problem
statements or hypothesis
the discovery, interpretation, and
communication of meaningful patterns in
data. It also entails applying data patterns
towards effective decision making.” (Wikipedia)
Delivering Insights Getting Data

Communicating your Retrieve/Clean/Wrangle


findings your data into a format
you can use Example of real world applications:
● Marketing/Digital Analytics
● People/HR Analytics
Making Conclusions Exploring Data

Drawing conclusions Finding patterns in it, and


● Risk Analytics
and/or making building your intuition
predictions about it ● News Analytics
Datadidak

Analytics Flows & Tools


● Internal web-app logging (Node.js, Python,

01 Data Logging ●

Java)
Pub/Sub (GCP, AWS)
Firebase (app analytics)

02
● Files
Data Persistence ● NoSQL (MongoDB, CouchDB, HBase)
● RDBMS (MariaDB, PostgreSQL, SQL Server)

03 Data Retrieval


SQL*
BigQuery (GCP)

04
● Excel
Data Processing ● R
● Python

05
● Powerpoint
Data “Storytelling” ● Tableau
● Google Data Studio
SAMPLE CONCEPTS
Datadidak

Concept -- Cohort Retention (New Customers)


Datadidak

Concept -- Cohort Retention (ROI)


Datadidak

Concept -- Customer Segmentation

● WHAT to offer ?
● WHEN to offer ?
● HOW to offer ?
● WHOM to offer ?
a.k.a : Segmentation

source: smartinsights.com
Datadidak

Concept -- Customer Segmentation


Psychographic Segmentation
03 - Activities, Interests
- Values, Attitudes

Demographic Segmentation
Quantitative
- Age, Gender, Location &
- Marital status, Education, Religion, Qualitative
Income
01 02
.
Behavioral Segmentation
- Habits, Patterns, Preferences
- RFM (Recency, Frequency, Monetary)
Datadidak

Concept -- Customer Segmentation (RFM)


Datadidak

Concept -- Customer Segmentation (RFM)

● *AA -- Champions
○ AAA -- Loyal Champions
○ CAA -- Hibernating Champions
64 ● AAD -- Low-spender
groups
● DDD -- Lost
● (and so on)
Datadidak

Concept -- A/B Testing


A/B tests consist of a randomized
experiment with two variants, applying
statistical hypothesis testing

Create/Split Determine
Focus on ONE thing Run the test Evaluate
Sample groups Sample size
- Pick one variable - Control vs. Variant - How long (duration)? - Test both variants - Is there a winner?
- Identify a goal - Equally & randomly - What’s your simultaneously - So what?
significant-threshold? - Don’t do other test - Plan for next test(s)
concurrently
Datadidak

Concept -- A/B Testing


Let’s try starting with 2000 visitors for each variant

The chi-square statistic is 5.3249. The p-value is .021023. Significant at p < 0.05

Let’s add another 1000 for each ...

Variant A Variant B
The chi-square statistic is 2.7238. The p-value is .098865. Not significant at p > 0.05

50% visitors 50% visitors


… so, how many unique-visitors (each variant)
Sample split method: should we run this test against?
Even/Odd by IP address : 202.124.32.16
Even/Odd by Timestamp : 2020-02-23T13:23:27
Datadidak

Concept -- A/B Testing


Parameters when calculating our example A/B test
sample-size, based on statistical hypothesis testing
(i.e: reject/accept Null hypothesis -- diff A & B equals to zero)

● CR Var A, B, min diff A & B


● chosen Confidence Level
○ 95% -- most common
● chosen Statistical Power
○ 80% -- most common
● One or Two-tailed Test
○ One-tailed test -- in this
example, we only care
positive diff, and not negative
diff.
We need at least 6,130 unique visitors per variant, to have a
statistically significant result
(with 95% confidence level & 80% statistical power)
Datadidak

</ Intro >


Datadidak

<hands-on>
● Excel Formula in Google Sheet
● Interactive Visual in Google Data Studio

</hands-on>

You might also like