Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

As you work through this notebook, follow along in the classroom and answer the

corresponding quiz questions associated with each question. The labels for each classroom
concept are provided for each question. This will assure you are on the right track as you work
through the project, and you can feel more confident in your final submission meeting the criteria. As
a final check, assure you meet all the criteria on the RUBRIC.

Part I - Probability

To get started, let's import our libraries.


In [2]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
random.seed(100)
%matplotlib inline

1. Now, read in the ab_data.csv data. Store it in df. Use your dataframe to answer the


questions in Quiz 1 of the classroom.

a. Read in the dataset and take a look at the top few rows here:
In [3]:
# take a look at the first five rows of the data
df = pd.read_csv('/home/kesci/input/ab_testing_data9749/ab_data.csv')
df.head()
Out[3]:

l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

0 8 2 c o 0
5 0 o l
1 1 n d
1 7 t _
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

-
0
1
-
2
1

2
2
: p
r
0 1 a
o
4 1 g
l
: e
4
8
.
5
5
6
7
3
9

1 8 2 c o 0
0 0 o l
4 1 n d
2 7 t _
2 - r p
8 0 o a
1 g
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

-
1
2

0
8
:
0
1
: l e
4
5
.
1
5
9
7
3
9

2 6 2 t n 0
6 0 r e
1 1 e w
5 7 a _
9 - t p
0 0 m a
1 e g
- n e
1 t
1
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

1
6
:
5
5
:
0
6
.
1
5
4
2
1
3

3 8 2 t n 0
5 0 r e
3 1 e w
5 7 a _
4 - t p
1 0 m a
1 e g
- n e
0 t
8

1
8
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

:
2
8
:
0
3
.
1
4
3
7
6
5

4 8 2 c o 1
6 0 o l
4 1 n d
9 7 t _
7 - r p
5 0 o a
1 l g
- e
2
1

0
1
:
5
2
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

:
2
6
.
2
1
0
8
2
7

b. Use the below cell to find the shape of rows in the dataset.
In [4]:
# take a look at the shape of the data
df.shape
Out[4]:
(294478, 5)

In [5]:
# take a look at the data types of the columns
df.dtypes
Out[5]:
user_id int64
timestamp object
group object
landing_page object
converted int64
dtype: object

c. The number of unique users in the dataset.


In [6]:
# the number of the unique user
# len(df['user_id'].unique())
df['user_id'].nunique()
Out[6]:
290584

In [7]:
# take a look at the description of the data
# df.describe()

d. The proportion of users converted.


In [8]:
# the proportion of user converted
df[df['converted'] == 1]['user_id'].nunique() / df['user_id'].nunique()
Out[8]:
0.12104245244060237

You might also like