Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Table of Contents

Part A:....................................................................................................................................................2
Data sources in Business and Economics:..........................................................................................2
Data collection method in Business and Economics..........................................................................2
Part B:....................................................................................................................................................2
B.1: Missing value..............................................................................................................................2
B.2: Summary statistics, tables and charts to explore each variable and make comment................3
B.3: Chart, table and correlation calculation to explore the relationship:.......................................10
B.4: Age, Ticket, SipSp, Parch and Fare classified by Survival, make comment and evaluation on the
results:.............................................................................................................................................16
Part C:..................................................................................................................................................17
C.1: T-test to compare quantitative variables Age, Fare classified by qualitative variable Survival. 17
C.2: The Fare can depends on Age, SipSp, Parch and Sex................................................................18
References:..........................................................................................................................................19

1
Part A:
Data sources in Business and Economics:
In research work, people often use two data sources: Primary data and Secondary data.
Primary data is data collected by the researcher from sources directly from surveys.
Depending on the scope of the survey, primary data can be divided into two types: full
survey and sample survey. An exhaustive survey is the collection of information on a
study population as a whole. A sampling survey is to take out a number of
representative elements for research and then infer the overall results by statistical
methods. (Driscoll and Brizee, 2017)
Secondary data is available data that has been aggregated and processed. These data
are collected from internal data, newspapers, magazines, data from the government,
information from research organizations. (Driscoll and Brizee, 2017)
Data collection method in Business and Economics
To collect data and information, we have the following methods:
Interview: face-to-face, by phone. With this method, information will be quickly
collected. However, this method also has limitations such as information content will
be limited, costly in terms of time and money. (Quentin Ainsworth, 2021)
Observation: is the simplest and most effective way to collect data, through observing
the actions and expressions of objects to collect information. The disadvantage of this
method is that because of the subjectivity of the observer, they add their own
judgment to the collected data. Therefore, there will be a risk of bias in information
and data. (Quentin Ainsworth, 2021)
Questionnaire and survey: this is a very popular method today because it does not cost
money and time. Moreover, it can proceed directly via the internet, mail, telephone.
With this method, we can easily collect accurate and truthful information. Besides, the
downside is that there will be a late response, the possibility of not being clear in the
selection. (Quentin Ainsworth, 2021)

Part B:
B.1: Missing value

2
Ptich
B.2: Summary statistics, tables and charts to explore each variable and make comment
Quanlitative: Suvival

3
Through the above table, we can see that the number of people who died on the titanic
train accounted for 63.6% of the total number of people present on the train. Mean
while only 36.4% of people were still alive on the titanic train. The number of people
who have died has exceeded the number of people who are still alive.
Quanlitative: Passenger class

4
On the titanic train, through the above statistics, we can see that more than half of the
passengers (122 people) accounting for 55.5% are 3rd class customers. Next, the
number of 1st class customers has 55. who make up 25% of the total number of
customers on board. Finally, at least the number of 2nd class customers is only 43
people (43%) out of 220 people (100%).
Quanlitative: Sex

5
Through the above gender statistics, we see that the number of male passengers is
more than the number of female passengers. There were 158 male passengers,
accounting for 71.8% of the total 220 passengers on the titanic train. The number of
female passengers is only a small fraction of 28.2% (62 people) of all passengers.
Quanlitative: Embarked:

6
Through the statistics table above, we can see that the number of passengers boarding
the titanic train from the port of Southampton is the most. The number of passengers
from the port of Southampton was 163 people, accounting for 74.1% of the total
number of passengers on the titanic train. Next, the number of people boarding the
ship from the port of Cherbourg was 37 people (16.8%). Then 19 people boarded the
titanic from Queenstown Harbor, accounting for 8.6%. Out of a total of 220
passengers, one passenger 830 is unknown from which port, accounting for the
remaining 0.5%.
Quantitative: Age

7
On the Titanic, based on the age statistics table, the largest number of passengers on
board was 80 years old and the youngest was less than 1 year old, only 4 months old.
There is a big difference between the ages of the oldest passenger and the youngest
passenger. The average age of 220 passengers on the Titanic was 29 years old, most of
the passengers on board were between 16 and 42 years old.
Quantitative: Passenger Fare Pice

Fares with passengers on the train also have a huge difference, there are people who
do not have to pay to board the train. Meanwhile, someone has to pay up to 512
pounds to be able to board this titanic train. The average ticket price per person is $35,
ranging from free to $97. Moreover, through the bar chart, we see that more than 125
passengers are free without having to buy tickets.

Quantitative: Number of Siblings/Spouses

8
Through the data table and graph above, we can see that the maximum number of
Siblings/Spouses is up to 8 siblings or spouses, at least 0 has siblings/spouses.
Furthermore, most of the more than 150 passengers on board the titanic train were
only children with no siblings and were single. There are quite a few passengers left
with 1 sibling/spouse or more.
Quantitative: Number of Parents/Children

9
Of the total 220 customers present on the titanic train, according to the above
statistics, the largest is a family of 6 people and the smallest is without relatives. There
is a huge difference between the number of people with 6 people in a family and those
without parents/children. Through the chart, we see that more than 150 passengers on
the train are all alone, with no relatives accompanying them.

B.3: Chart, table and correlation calculation to explore the relationship:


Quanlitative: Survival and Pclass

10
Based on the statistics of survival or not with passenger class, the number of 3rd class
passengers died the most 95 people (accounting for 77.9% of the total number of 3rd
class passengers). After that, the number of 2nd class passengers who died 27 people,
accounting for 62.8% of the number of 2nd class passengers. The number of
passengers in 1st class with the number of passengers who died at least 18 people
when compared to the number of 2nd class passengers. and rank 3 is dead.
Furthermore, we can see that the least number of 1st class passengers who died, the
1st class who had the most survivors was 37. Then there are 27 survivors who are 3rd
class passengers. Finally, there are 2nd class passengers with the least number of
survivors in 3 classes of 16 people. In conclusion, we can see that the higher the
passenger class, the higher the number of survivors than the dead. Conversely, the
lower the passenger class, the fewer survivors will be than the dead.

Quanlitative: Survival and Sex:

11
With a total of 220 passengers on the titanic train, 17 were female and 123 were male,
all of whom died. The survivors are 45 female and 35 male. Besides, female
passengers on the titanic train were saved up to 56.3% more than female passengers
who died at 12.1%. For male passengers, the number of deaths is up to 87.9% more
than for male passengers who are still alive 43%. Moreover, we also find that the
number of male passengers who died is more than the number of male passengers who
are still alive. In contrast, the number of female passengers who died was less than the
number of female passengers who survived.

Quanlitative: Survival and Embarked:

12
Of the total of 220 people, most passengers boarded the titanic from the port of
Southampton with the highest death toll of 110 (78.6%) compared to the other two
ports. The other two ports, Cherbourg and Queenstown, have an equal number of
deaths, 15 people, the number of deaths in each port accounts for 10.7% of the total
number of deaths. However, the number of survivors in the port of Southampton is the
highest among the three ports, accounting for 66.3% (53 people) of the total number
of survivors of all three ports. The statistics of the number of surviving passengers at
the remaining two ports also changed, typically 22 passengers at the port of Cherbourg
alive were 22 more than the number of passengers at the port of Queenstown. The
number of passengers still alive at the Port of Queenstown is four. In addition, the
passenger who did not come from any port is 1 person and the passenger is still alive.

13
Quantitative: Fare and Age

Through the chart, we can see that 2 variables, passenger fare and passenger age are
not correlated with each other. In addition, the trends of both variables do not increase
or decrease together, so the fare that a passenger has to pay for the trip is not related to
the age of that passenger.
Quantitative: Fare and SipSp

14
Through the chart, it can be seen that 2 variables, passenger fare and Number of
Siblings/Spouses, can be correlated. Passengers with fewer Siblings/Spouses may
have to pay higher fares than passengers with a large number of Siblings/Spouses.
Correlations

Fare and Age


Corr(fare,age) = +0.107
The relationship between these two variables is positive. Besides, 0.107 it is between
0.0 and 0.3 so it will be negligible correlation.
Sig = 0.114 > 0.05
Variables fare and age are not correlated.
Fare and SipSp
Corr(fare,sipsp) = +0.113
The relationship between these two variables is positive. Besides, 0.113 it is between
0.0 and 0.3 so it will be negligible correlation.
Sig = 0.094 > 0.05
Variables fare and sibsp are not correlated.

15
B.4: Age, Ticket, SipSp, Parch and Fare classified by Survival, make comment and
evaluation on the results:

Based on the above table of statistics, we have the following information:


First about the ages of the living and dead passengers. The average age of surviving
passengers is 30 years old, which is more than the average age of dead passengers (28
years old). Besides, the oldest age of the surviving passengers is 80 years old while
the oldest age of the deceased passengers is 70 years old. The oldest age of the
deceased is less than the oldest age of the surviving passenger. The minimum age of
the surviving passenger is 4 months old, which is less than the minimum age of the
deceased passenger when they were 2 years old. As for the remaining age metrics,
such as range, standard deviation and variance, it's clear that there are more age data
for the living than for the dead.
Second, in terms of passenger fares, the average fare for dead passengers was $18, and
for surviving passengers they had to pay $63. We can see that there is a huge
difference between the average fare of the living passengers and the average fare paid
by the dead passengers. Even the largest fare paid by a dead passenger is less than the
maximum fare paid by a living passenger. The largest fare for dead passengers is
$113, and the largest fare for survivors is $512. Similarly, as with age, the remaining
figures for the fare of the deceased passenger are less than the fare of the living
passenger.
Third, in terms of number of parents/children, the deceased passengers have the
largest number of parents/children and the range is all 6. The surviving passengers
have the largest number of parents/children and the range is also 3 people. The largest
number of parents/children and range of deceased passengers is greater than the
number of largest parents/children and range of surviving passengers. Regarding the
standard deviation of the number of parents/children of the surviving passengers is
equal to the number of siblings/spouses of the deceased passengers. As for variance,
the number of parents/children of the living is less than the number of parents/children
of the deceased.
Finally, in terms of the largest number of siblings/spouses and the range of the dead
passengers were all 8. While, the largest number of siblings/spouses and the range of

16
the survivors are all three. We see that the number of siblings/spouses is the largest
and the range of the dead is larger than that of the living. Like the standard devivation
and varince figures for the number of parents/children, the numbers for the number of
siblings/spouses are the same.

Part C:
C.1: T-test to compare quantitative variables Age, Fare classified by qualitative
variable Survival

Sig(F) = 0.000 < 0.05 => Equal variances not assumed


Test: H0: muy 1 = muy 2
H1: not H0
T = -4,225
Sig(t) = 0.000 < 0.05
 Reject H0
 Muy 1 # muy 2

17
Sig(F) = 0.048 > 0.05 => Equal variances assumed
Test: H0: muy 1 = muy 2
H1: not H0
T = 0.973
Sig(t) = 0.332 > 0.05
 Muy 1 = muy 2

C.2: The Fare can depends on Age, SipSp, Parch and Sex

18
0.646 >0 -> Age increase: 1 unit => Fare increase: $0.646
Sig(age) = 0.037 < 0.05 -> Sig at 5%
4.742 >0 -> Number of siblings/spouses increase: 1 unit => Fare increase: $4.742
Sig(number of siblings/spouses) = 0.316 > 0.05 -> Not sig at 5%
7.865 >0 -> Number of parents/children increase: 1 unit => Fare increase: $7.865
Sig(number of parents/children) = 0.180 > 0.05 -> Not sig at 5%
-19.696 <0 -> Sex decrease: 1 unit => Fare decrease: $19.696
Sig(sex) = 0.041< 0.05 -> Sig at 5%

References:
Driscoll & Brizee. What is Primary Research? Purdue Online Write Lab. Retrieved
from https://owl.english.purdue.edu/owl/resource/559/01/ on June 24th, 2017
BYU FHSS Research Support Center. Data Types and Sources. Retrieved from
https://fhssrsc.byu.edu/Pages/Data.aspx on June 24th, 2018
Yin, R. (2017). Case Study Research and Applications: Design and Methods 6th
Edition. SAGE Publications.
Blog, F., 2021. Primary vs Secondary Data:15 Key Differences & Similarities.
[online] Formpl.us. Available at: <https://www.formpl.us/blog/primary-secondary-
data> [Accessed 17 May 2021].
Ainsworth, Q., 2021. Data Collection Methods. [online] JotForm. Available at:
<https://www.jotform.com/data-collection-methods/> [Accessed 17 May 2021].

19

You might also like