Professional Documents
Culture Documents
Assignment 1 - TITANIC
Assignment 1 - TITANIC
In order to study the correlation between all the variables, a new data set was created in which all
categorical variables were transformed into numerical.
2. QUESTIONS:
2.1. What was the price difference between the classes?
As in all cruises, there are different types of classes where a passenger can stay, in this case there are
three and a high correlation between the variables Fare and Pclass was observed. We made different
subsets of the classes so that the difference between these (in terms of fare price) could be
established.
As observed below, class1 is the most expensive one, with a mean of 88.69, then class2 with a mean
of 22.20 and finally, the cheapest class fare is the third one with a mean of 13.49.
The following plots show how the fare dependence on the class type.
When studying the variable Fare, it was discovered that there were passengers who had not paid
anything throughout their stay in Titanic. These were thought to be workers. A new dataset called
‘workers’ was created in which the Fare value of those passengers was 0.
This dataset contained 7 people. The age of these was between 28 and 49 years old, they were all men
and they all embarked in Southampton. Also, none of them brought any family members, and none of
them survived.
2.3. Where did the passengers stay (Pclass) depending on where they embarked?
A high correlation was observed between the variables class and embarked. Hence, a study was made
with these.
Using the subsets of the classes previously created, the passengers stay depending on the place they
embarked was evaluated. Through tables and prop.tables, huge difference between the embarkation
cities were observed: in Cherbourg most of the people stayed in class 1, which is the most expensive
one, whilst in Southampton and Queenstown class 3 was the most common one. These conclusions
are derived from the following geom_bars() and the general one.
2.4. How much money did the passengers spend depending on where they embarked?
Having this information, it is noticeable that those who embarked in Cherbourg spent more money
than those in Queenstown, who paid significantly less, (since, as previously discovered, most of the
passengers who embarked in this city were placed in the third class, which we suppose was the
cheapest). In order to analyze this information more clearly, a boxplot of this relationship was made.
2.5. Did class level have anything to do with surviving the Titanic?
The initial hypothesis posed was that the older the passengers were, the smaller the survival rate (as
ageing implies a gradual physical degradation). In order to prove this, the mean of the passengers’ age
was obtained (29.17 years).
Then, the mean age of those who survived was obtained (28.048 years), as well as the mean age of
the ones who did not (29.88 years). As it can be observed, the mean age of the surviving passengers
was lower than the total mean age whilst the mean age of the ones who died was higher.
Nonetheless, as these results were not conclusive, percentages with the chances of survival of
passengers with ages higher and lower than the mean were obtained. In fact, 41.56% of passengers
older than the mean age survived versus 36.96% of the younger ones. Actually, these percentages
contradicted the initial hypothesis. As a result, a histogram was plotted in order to further analyze the
data:
This graph does not show observable
dependence patterns between the
variables ‘Survived’ and ‘Age’. So as to
prove the lack of relationship between
these, the percentage of survivors
within four age intervals were
computed.
These ranged from 0.42 years
(minimum age) to 74 years (maximum
age) in intervals of 18.395 years.
The percentages obtained were the
following:
2.8. Did having relatives on board the Titanic condition the passengers’ survival?
In order to evaluate this relationship, the following percentages were obtained: 29.38% of the
passengers who were alone survived, whilst 50.04% of those who had families survived. The
hypothesis established was whether being accompanied favored the passengers’ rate of survival.
In order to prove it, a variable called ‘Family’ with two levels was created and bound to the initial data
frame: the value 1 represents the passengers with relatives on the Titanic and 0 those without them.
Then, a bar plot, a table and a prop.table were computed in order to evaluate this dependence:
Lastly, and as a final proof of this, the correlation between the variables ‘Family’ and ‘Survived’ was
significant: 0.23. This implies that those passengers who had families (value 1) had a higher survival
rate.