REPORT Data-Science

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

PROJECT REPORT

Subject: MAS202
Teacher:
Group member:

1
I. PART A
Part I. Introduction & Methodology
a. What is the topic of your group project?
The topic is about 300 U.S data scientists working in the U.S in 2023
b. What are the main issues you plan to address? What questions do you have
about your project?
Our analysis will approach the salaries of a very hot career in 2023: data science.
The question we have is whether there is a difference between the mean salaries
of data analyst, data scientist, and data engineer. Furthermore, we want to
discover how experience level and company size affect one’s salary in this field.
c. What do experts think about your research issues? Provide background
information on the research topic with in-text reliable references.
Experts can be interested in our topic as salary is one of the important factors that
affect a student’s major and career choice.
d. Identify the continuous variables (independent and dependent) between which
you would like to find the relationship. Explain why you are choosing these
variables.
Salary is a continuous variables. The independent variable studied is salary. The
dependent variables are experience level and company size. We choose these two
variables because we assume they are related to each other.
e. Identify the population in your research about which you’ll be making inference.
The population is Data Science Salaries across the world in 2023
f. Identify the samples and the sampling method that you will use to collect the
data.
Our data is a sample of 300 U.S data scientists working in the U.S in 2023. The
data is collected based on random sampling method.
g. Submit the designed questionnaire which you’ll be using to collect the data if
you use survey in your data collecting step.
Our data is secondary data from Kaggle.
h. Identify the survey errors that might have occurred in your research while
collecting data.
It is likely that the sampling error will occur as the sample size is small when
compared to the population size.

2
Part II. Descriptive Statistics Results
a. Demographics information
The topic is about 300 U.S data scientists working in the U.S in 2023
b. Descritive statistics
i. A table of the measures of Central Tendency &
ii. A table with the measures of Variation

salary

Mean 164132.3133
Standard Error 3414.468727
Median 155000
Mode 145000
Standard Deviation 59140.33315
Sample Variance 3497579005
Kurtosis 0.0470
Skewness 0.4623
Range 317310
Minimum 25500
Maximum 342810
Sum 49239694
Count 300
Table 1: Descriptive statistics of Salary (Unit: $/year)
Key Findings
In the sample, the mean salary is $164,132.3133/ year with median is $155,000 and
mode is $145,000. Additionally, it ranges from $25,500 to $49,239,694 and has a
standard deviation of $59,140.33315

iii. The Box-and Whisker Plot /Histogram or other graphs if necessary

3
Table 2: Bar chart of
Count of job_title by experience_level
Experience level
300

250 241 Key Findings


200
In the sample, there
are 12 people are in
150
executive level, 241
100 people are seniors, 28
people are in mid level,
50 28
19 12 and 19 people are in
0
EN EX MI SE
entry level.

Count of company_size
Table 3: Bar chart of
300
270 Company size
250

200 Key Findings


In the sample, there are 28
150
people are working in large
100 companies, 270 people are
50
working in medium
28
2
companies, and only 2
0
L M S people are working in small
companies.

You might also like