Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Business Analytics coursework (BN1116)

A Study of the Housing Market in Sheffield, United Kingdom


Candidate Number:
Submission Date: 15th December 2014

A Study of the Housing Market in Sheffield, United Kingdom


Summary

This report requires me to act as a junior business analytics consultant to present the manager
with an overview of the housing market in my hometown (Sheffield) focusing on the price, type, and
size of the houses for sale there. It includes a set of techniques that is used in carrying out the study of
the housing market there such as visulaisation techniques, descriptive statistics, confidence intervals
and hypothesis testing, which will be explained later in this report. The report also includes a problem
with the sample data, which will be discussed in the conclusion. The data of the sample in this report
is collected from a reliable source (www.rightmove.co.uk).
Sampling Technique
The method that is used to collect a sample of the houses for sale from
the population was Stratified Sample. A Stratified Sample is used to divide the
population (Collection of all data to be considered) into subgroups (called strata)
and then randomly select the proportion from the different subgroups
(Thompson, 2012). This method is used to highlight and observe the relationship
between two or more subgroup within the population. This method guarantees
the best coverage of the population as unlike other technique, in here one has a
control over the subgroup that is included in the sample. The result is a
representative sample as no one element of the population is to be excluded.
Visualisation Technique
1. Pie Chart
A pie chart is a method
that uses the Pie Slices
to
represent
the
percentage that different
categories contribute to
the overall total. This
method is used as the
number of the different
categories
(different
types of bedrooms in this
case) are small (max of
five). As such, it is easy
for the eye to distinguish
between the different
categories and also make
the
data
easy
to
interpret. Also, it is better
to use this technique to present the data of the different categories to the
manager than a small table as it facilitate better understanding and visualising of
the data. From the chart, we can see that the three-bedroom houses available for
sale are the highest in Sheffield with a 43% of the total. While the lowest of all,
are the five-bedroom properties available for sale with only a 5% of the total. The
two-bedroom and the four-bedroom properties available for sale are roughly half
of the three-bedroom available for sale properties in the town with a 20% of the
total. Finally, the one-bedroom available for sale properties are 12% of the total.
This graph, as a result, will be helpful in comparing between the different
categories of the houses available for sale.

2. Bar Chart
The Bar chart is a type of
graph that is used to
present and compare the
number
or
other
measures such as the
mean. The mean is a
measure of the central
location which can be
used as a single value to
represent the data. One
advantage of the bar
chart to the manager is
that it is easy to interpret.
A bar chart which is like a
pie chart is useful in
comparing
groups
of
data. In the diagram, one
can notice that as the number of bedrooms increases the average price (mean)
also increases. The average price of the five-bedroom properties for sale is the
highest with an average of 457000. Conversely, when the number of bedroom
is one which indicates a smaller house size the average price is also low with an
average of roughly 83742. The difference between the average prices of the
properties with a five-bedroom and a four-bedroom is roughly 168733. The
three-bedroom average price is in the middle of the other categories of
bedrooms with a value of approximately 204763. Finally, the difference
between the average prices of a one-bedroom and the two-bedroom properties
for sale are roughly 45245. This Graph, as a result, will help the manager in
visualising the relationship between the number of bedrooms and the prices that
is vividly a positive relationship (as one increase, the other one also increases).
3. Histogram
A histogram is a vertical bar
chart that depicts the
distribution of a set of data.
The frequency on the Y-Axis
shows how many times of a
single characteristic are
found within the sample set.
Hence, the area of each bar
will tell us the frequency.
Histograms are used for
continuous data and uses
numerical data (called
quantitative). The implication
of this method faced in SPSS
was the gap that was
present, which indicates missing information so the solution taken was to change

the scale. From the chart, the most frequently priced data is roughly between
80000 and 120000 pounds while the least frequent data is roughly between
38000 and 70000 pounds. The shape of the histogram (Skewed to the right) tells
us that all the collected data has a value that is greater than 0. Histograms are
vital for the manager as it represents the shape of the data and it is easier to
interpret than when placed in a tabular format.
Descriptive Statistics
Flat

SemiDetached
House

Terraced

334104

89834

212998

127984

90412.42

118706.48

32297.78

87492.73

41560.40

Maximum

425000.00

650000.00

150000.00

550000.00

284950.00

Minimum

195000.00

170000.00

110000.00

92500.00

90000.00

Measures/t
ype

Bungalow

Detached
House

Mean

312843

Standard
Deviation

Descriptive Statistics are numbers that are used to summerise and describe the
data (Mann, 1995). These descriptive statistics are divided into two main
categories, measures of central tendency and measures of variability (1995). The
central tendency is a measure that indicates the central location of the data.
While the variability shows the level of difference that exists in the data.
From the table you can see the centre of the different categories, which are
called the mean. The mean is used to represent the data in one number and this
is to find one representative value of the data and also to make comparison
between them (Selkrik, 1978). As illustrated the categories that have the highest
value in terms of price using the mean are the Detached, the Bungalow, and the
Semi-Detached houses (334104, 312843, and 212998 respectively). Whereas
the lowest values of that are the Terraced with a representative value of 127984
and the Flat with a representative value of 89834.
The measure of differences that appears in the table is the Standard Deviation.
There is another method called the Range but I have not included it as unlike the
Standard Deviation, the range is affected by the maximum and the minimum
value, which can affect the results (1978). The Standard Deviation shows how
concentrated the data is around the mean and the smallest the standard
deviation, the more concentrated it is around the represented value (the Mean)
(1978). From the table, we can see that the category that shows the most
differences are the Detached, the Bungalow and the Semi-Detached houses
(118709.48, 90412.42, and 87492.73 Respectively). Although the differences
are large (Standard Deviation), it does not indicate a bad thing because it just
shows how varied (different) the data is. The other two categories have smaller
differences with 32297.78 in the Flats and 41560.50 in the Terraced. For the
manager, this is helpful as it gives which category has the most verified set of
data.
The other measure that can be calculated from the table is the range where we
subtract the maximum from the minimum value. For the manager, it will be

helpful to show the maximum and the minimum prices of each category. As such,
from the table the maximum and the minimum price are already calculated. The
manager, can use the table to compare between the highest value and the
lowest value of each of the different type of houses

Confidence Intervals
SemiDetache Terrace
Flat
d
d
House
95%
Lower
229225.3 283978.8 74266.91 176882.7 110828.5
limit
17
15
8
75
12
Upper
396460.3 384229.5 105400.9 248113.2 145139.0
limit
98
18
77
25
88
99%
Lower
186150.0 266080.0 68505.80 164055.6 104735.4
limit
98
11
3
23
18
Upper
439535.6 402128.3 111162.0 261940.3 151232.1
limit
16
22
91
77
82
In any Study, the results we have are not going to perfectly near to the
overall population (Cox, 1974). The confidence Interval is the range of values
that the true value in the population is expected to fall within, based on the
sample (1974). The confidence intervals is helpful as it gives us a better idea
what are the results of the population are, based on the sample. From the table,
one can be 95% sure that the actual value of the centre value (the mean) of
each type of house will be between the upper and the lower limit. The chances
that the actual value will be higher or lower than the limit (outside of the
interval) will be 5%. The same interpretation will be for the 99%, but with 1%
chance that it will be higher or lower than the limits.
Confidence Intervals
for Mean / Type of
property

Bungalo
w

Detache
d
House

4. Hypothesis Testing
To illustrate whether the average prices of the different types of the
sample are in line with the population average, a hypothesis testing was carried
out. A Statistical hypothesis is an assumption about the population centre value
(the mean) (Lemann and Romano, 2005). A hypothesis testing is carried out to
accept or reject the statistical hypothesis. This method is used such that when
the sample data are not consistent with the population mean (the Statistical
Hypothesis), the hypothesis is rejected.
To assess that, a one-sample t-test was carried out and it is a test used to
assess whether the sample comes from a particular population (2005). The
results were such that only the terraced and the flat houses were in line with the
average prices of the population in Sheffield. The implication of this method is
such that using different dates (2013 or 2014) of the population average price
can achieve different results and also because of the greater level of variance (a
measure of the spread between numbers in the data set) which held less precise
estimate of the population.

Conclusion
In this report various methods were used to give an overview to the manager of
the general view of the housing market in Sheffield. A technique of a stratified
sample was used to collect a sample that would be representative of the
population. However, the implication that took place when testing the hypothesis
was such that not all of the subgroups average prices were in line with the
population of Sheffield. This, as a result, requires further investigation of the
problem. Visualisation techniques were then used to represent the manager with
a graph representing different data set. This is used because a picture can
summerise a thousand words. Later on, a descriptive statistics table that
summerises and explain the use of the different types of the descriptive statistics
methods was used. A confidence Interval technique was placed to help the
manager with forming an idea of the result of the population, based on the
sample. In addition, the problem that was presented when doing the hypothesis
testing could be because of the timeline in which the average price of population
was collected or because of a large spread between the numbers in the data
sample.
References
(Sampling Technique)
Thompson, S. (2012). Sampling. (3rd ed.). New Jersey: Wiley & Sons Inc.
(Histogram)
Doane, D.P. (1976). Aesthetic Frequency Classication. American Statistician,
30(4), pp. 181-183
(Bar Chart)
Kelley, W. M.; Donnelly, R. A. (2009) The Humongous Book of Statistics Problems.
New York, NY: Alpha Books
(Pie Chart)
Walkenbach, J. (2013) Excel 2013 Bible. Indianapolis: Wiley.
(Descriptive Statistics)
Selkirk, K. E. (1978). Descriptive Statistics. Nottingham: University of Nottingham
School for Education.
Mann, Prem S. (1995). Introductory Statistics (2nd ed.). New Jersey: Wiley & Sons
Inc.
(Confidence Intervals)
Cox D.R., Hinkley D.V. (1974) Theoretical Statistics, Chapman & Hall, p49, p209
Field, Andy (2013). Discovering statistics using SPSS. London: SAGE.
Kendall, M.G. and Stuart, D.G. (1973) The Advanced Theory of Statistics. Vol 2:
Inference and Relationship, Griffin, London. Section 20.4
Neyman, J. (1937). "Outline of a Theory of Statistical Estimation Based on the
Classical Theory of Probability". Philosophical Transactions of the Royal Society A
236: 333380

(Hypotheses Testing)
Lehmann, E.L.; Romano, Joseph P. (2005). Testing Statistical Hypotheses (3E ed.).
New York: Springer.

You might also like