SFM A1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

Part A:

1. Information about commerce and the economy taken from a published


source
• Analazing the data
- Descriptive data
Brief informative coefficients known as descriptive statistics are used to
summarize a particular data collection, which may be a sample of a population or a
representation of the full population. Measures of central tendency and measures of
variability are included in descriptive statistics (spread). The mean, median, and mode are
indicators of central tendency, while the standard deviation, variance, minimum and
maximum variables, kurtosis, and skewness are indicators of variability (Hayes, 2022).
- Inferential statistics
By studying the samples that were taken from the population data, inferential
statistics aids in the construction of a solid knowledge of the data. It uses a range of
analytical techniques to aid in the creation of population-level generalizations. Many
different sampling methods are employed in order to select random samples that
accurately reflect the population. Simple random sampling, stratified sampling, cluster
sampling, and systematic sampling approaches are a few of the essential strategies.
- Descriptive and inferential statistics and their distinctions

Descriptive statistics Inferential statistics


The role logically group, Data comparison,
examine, and testing, and
present the prediction
information.
Form of final result Table, graph, and Probability
chart
Usage to explain a to explain the
circumstance probability that an
event will take place.
Function Sample describes It makes an effort to
the data that is extrapolate
previously known, to conclusions about
sum it up. the population from
the available data.
2. Data
• Data source:
Two main categories of data sources are:
- Internal source: The majority of the information used in this type of source is
gleaned from the company's database and includes details on its personnel, production,
sales, and other areas.
- External source: For information that is available from sources outside of the
company, you will need to consult external sources. Therefore, collecting data must
include making purchases or signing contracts with third parties. The information can
only be viewed with both parties' permission.

Statistical data is generated to satisfy the needs of companies and other


organizations that demand more specialized and focused data. The two types of statistical
data are experimental data and observational data.
- Observational data: A business observes a specific event or time period, gathers
data on the relevant variables, and then does a statistical analysis on the data. Public
opinion polls and surveys may be used to gather observational data.
- Experimentation data: It is important to keep in mind that experimental data
differs from observational data in that it is gathered in a controlled setting. The
information provided by experimental outcomes is therefore more thorough and reliable.
Existing data Statistical data
This information can be found from a To ensure the accuracy of the data,
wide variety of reliable sources. Despite this, this method uses the most recent subject-
there is no way to ensure the veracity of the specific figures and attributes. The
information provided. There could be disadvantage is that getting there will take a
typographical errors in publications, as well long time and cost a lot of money. The
as some articles and resources from the past eventual product will be very different from
century that haven't been updated. Because what is currently offered if the company can
of this, employing this data type is not allocate more resources to this strategy.
recommended, but if no other choices are
available, it is still a viable option.

- Techniques for collecting data: There are a variety of techniques for collecting data,
such as observation, recording, and documentation, surveys, questionnaires, interviews,
and studies of certain populations. Data can be categorised into two categories after it has
been obtained: qualitative and quantitative (numbers and counts)
Qualitative Quantitative
- A conversational interview in which one - This type of data collection makes use of
party asks questions and the other party surveys and questionnaires, in which
responds in order to obtain information. respondents are asked a variety of questions.
- Observation: In order to get the necessary The researchers can then compile the
information, researchers must keep an eye information the individuals submitted into a
on the participants' behavior. database.
- Focused group study also includes - They will look for records and materials
observation, questionnaires, and interviews pertaining to the study participants to make
because the participants must be a group or sure they have all the data they need.
team with similar characteristics. To acquire
data, it is important to have a thorough grasp
of the group.
Part B:
 Missing Value

Statistics
The type of Is the property In what floor the
property Furnished or not property is?
N Valid 501 501 501
Missing 0 0 0

There are no missing values in my data set of 501 observations on home price data
for the qualitative variable.

Statistics
The price of Number of Number of The Area of the
property bedrooms bathrooms property by m2
N Valid 501 501 501 500
Missing 0 0 0 1

My data appears to be lacking a value at the area for the quantitative variables like
price, number of bedrooms, number of math rooms, and area. There was just one missing
value among the variables, therefore I deemed it unnecessary to keep it. As a result, I only
have 500 observations instead of 501.

 Outliers
The normal distribution plays a crucial role in statistical analysis, especially when
trying to find a missing variable. This method is employed to transform the information
into a uniform distribution with a zero-mean and a one-standard-deviation. Restoring data
to a normal distribution from an outlier distribution requires the creation of a standard
deviation distribution. Boxplot plots illustrate data distributions, allowing us to assess the
dispersion of the data points, symmetry, wide or narrow distribution, minimum,
maximum, and other exception points, all of which will be used to identify outliers.
Statistics
The price of Number of Number of The Area of the
property bedrooms bathrooms property by m2
N Valid 500 500 500 500
Missing 0 0 0 0
Mean 3318944,172000 2,62 2,14 142,73
00000
Median 2500000,000000 3,00 2,00 121,00
00000
Mode 3100000,000000 3 2 125
000
Std. Deviation 4271883,361822 1,028 1,023 91,608
007000
Variance 1824898745701 1,058 1,046 8391,939
1,690
Skewness 3,758 1,134 1,555 3,454
Std. Error of Skewness ,109 ,109 ,109 ,109
Range 34967000,00000 6 6 815
0000
Minimum 33000,00000000 1 1 35
0
Maximum 35000000,00000 7 7 850
0000
Percentiles 25 800000,0000000 2,00 2,00 95,00
0000
50 2500000,000000 3,00 2,00 121,00
00000
75 3840000,000000 3,00 2,00 155,00
00000
The outliers of the number of bedrooms

The outliers of the area of house


The outliers of the number of bathrooms
The outliers in my observation were found after I obtained the data sheet and used
the Boxplot tool to check for observations like price, number of bedrooms, number of
mathrooms, and area. In order to ensure that the observation findings were not impacted, I
opted to remove the outliers using the IQR technique. When I took away the extreme
cases, my original 500 observations dropped to 369.

- For variable “Level”, I record as two groups and named “Level_Group”


 Houses with the expression of ground, 1st or 2nd levels are shown as “1” –
group 1
 The houses with the remaining expression are shown as “2” – group 2

 Summary statistics, charts and tables

The type of property


Cumulative
Frequency Percent Valid Percent Percent
Valid Chalet 335 90,8 90,8 90,8
Duplex 3 ,8 ,8 91,6
Penthouse 7 1,9 1,9 93,5
Standalone Villa 1 ,3 ,3 93,8
Studio 11 3,0 3,0 96,7
Town House 8 2,2 2,2 98,9
Twin house 4 1,1 1,1 100,0
Total 369 100,0 100,0

Based on the data sheet of the type of houses, I found Chalet house with 335
houses, accounting for nearly 91% of the total number of houses. Meanwhile other types
of houses account for a very low proportion. Ranked second, accounting for 11% is the
house studio with 11 units. Meanwhile, standalone villa has only 1 unit and only accounts
for 0.3%.

Statistics
The price of property
N Valid 369
Missing 0
Mean 1991043,027100
27100
Median 1950000,000000
00000
Mode 330000,0000000
00a
Std. Deviation 1465532,695392
450600
Variance 2147786081264,
261
Skewness ,727
Std. Error of Skewness ,127
Range 7867000,000000
000
Minimum 33000,00000000
0
Maximum 7900000,000000
000
Sum 734694877,0000
00000
Percentiles 25 600000,0000000
0000
50 1950000,000000
00000
75 3000000,000000
00000
a. Multiple modes exist. The smallest value is
shown
$ 1991043,027 is the average price of 369 houses in my observation. 50% of
protries have a value less than $1950000. Of all my observations, the most common
priced house is $330000. Looking at the data sheet, the house with the highest price is
$7900000 and the house with the lowest price is $33000. With a skewness coeffience of
0.727, the distribution is right-skewed.

Statistics
Number of bedrooms
N Valid 369
Missing 0
Mean 2,25
Median 2,00
Mode 2
Std. Deviation ,697
Variance ,485
Skewness -,183
Std. Error of Skewness ,127
Range 3
Minimum 1
Maximum 4
Sum 829
Percentiles 25 2,00
50 2,00
75 3,00

With bedrooms variable, the largest number of bedrooms is 4 rooms and the
number of bedrooms is 1 room. The number of mid-crotum rooms of 369 houses is 2.25.
The number of rooms is quite small, the deviation is 0.127 so the chart tends to be
symmetrical.

Statistics
Number of bathrooms
N Valid 369
Missing 0
Mean 1,68
Median 2,00
Mode 2
Std. Deviation ,468
Variance ,219
Skewness -,763
Std. Error of Skewness ,127
Range 1
Minimum 1
Maximum 2
Sum 619
Percentiles 25 1,00
50 2,00
75 2,00

With the bathrooms variable, the smallest number of rooms is 1. The number of
popular bathrooms in houses is 2. Each house has at least 1 bathroom. The average
number of bathrooms per house is 1.68. The distance range is quite small, the deviation is
0.127 so the expression is likely to deviate.

Statistics
The Area of the property by m2
N Valid 369
Missing 0
Mean 108,42
Median 109,00
Mode 125
Std. Deviation 32,144
Variance 1033,260
Skewness ,259
Std. Error of Skewness ,127
Range 189
Minimum 35
Maximum 224
Sum 40006
Percentiles 25 90,00
50 109,00
75 125,00
On the statistics table, the smallest house area is 35m and the largest room has an
area of 224m. The distance is 189m. The average area of 369 houses is 108.42m. In my
observation data, there are 50% of houses with an area greater than 109m and the most
common area of houses is 125m. The deviation is 0, 127 so the chart tends to be
symmetrical.

Is the property Furnished or not


Cumulative
Frequency Percent Valid Percent Percent
Valid No 186 50,4 50,4 50,4
Yes 183 49,6 49,6 100,0
Total 369 100,0 100,0
Of the 369 houses observed, 186 houses without real houses accounted for 50.4%.
Meanwhile, only 183 houses are fully furnished, accounting for 49.6%.

In what floor the property is?


Cumulative
Frequency Percent Valid Percent Percent
Valid 1 119 32,2 32,2 32,2
10+ 2 ,5 ,5 32,8
2 58 15,7 15,7 48,5
3 8 2,2 2,2 50,7
7 1 ,3 ,3 50,9
8 1 ,3 ,3 51,2
Ground 124 33,6 33,6 84,8
Highest 3 ,8 ,8 85,6
Unknown 53 14,4 14,4 100,0
Total 369 100,0 100,0
In my observation, the number of houses on the ground accounted for the highest
proportion, nearly 34%. Meanwhile, the number of houses with 1 floor and 2 floors
accounted for 32.2% and 15.7%, respectively%. Houses with more floors or in the
highest position account for very little, less than 1%. Besides, the number of unidentified
houses of level accounts for a large number, 53 units accounting for 14.4%.
 Correlation coefficient to explore the dependence of price on the remaining variables.
Correlations
The price of
property
Number of bedrooms Pearson Correlation ,136**

Sig. (2-tailed) ,009

Number of bathrooms Pearson Correlation ,145**

Sig. (2-tailed) ,005

The Area of the property by Pearson Correlation ,166**


m2
Sig. (2-tailed) ,001
**. Correlation is significant at the 0.01 level (2-tailed).

Based on the correlation data sheet of price variables and other quantitative variables in
observation, the correlation between price and bedroom is 0.136. Similar to the link between price
and mathrooms and the area, the correlation between price and the aftermentioned two variable is
0.145 and 0.166. I can see that this is a positie relationship. It means that as the number of bedrooms,
bathrooms or the area of the house increases, the price of the house will also increase.

Is the property Furnished or not Level_Group


No Yes 1,00 2,00
Standard Standard Standard Standard
Mean Deviation Mean Deviation Mean Deviation Mean Deviation
The price of 2048750,0 1413226,8 1932390,01 1518480,39 2070461,70 1456915,91 1639498,6 1462475,40
property 26881721 46810646 0928961 8482858 0996677 7532055 02941177 5259527

The coefficient of variation (CV) is used to compare the price variability between
various property categories because the means of furnished and unfurnished properties
and the mean of difference floors of properties differ.

CV formula = (Std Devitation / Mean) x 100

It is so that we get the CV table of the price variable and the two forbishop and
level-group variables.

Std Devitation Mean Coeficient of


Variantion
Is the property No 1413226,84 2048750,02 68,97 %
furnished or not?
Yes 1518480,39 1932390,01 78,58 %
Level-group 1,00 1456915,91 2070461,70 70,36%

2,00 1462475,40 1639498,60 89,20 %


Comparing the CV statistics across furnished and unfurnished homes reveals that
furnished homes have a higher CV than unfurnished homes, indicating that the prices of
furnished homes fluctuate more.

Since the CV data of level group 1 is lower than that of level group 2, homes with
levels in group 1 (the ground, first, and second floors) have less price variation than
homes in group 2 (the other levels).

The types of Std Devitation Mean Coeficient of


property Variantion
Chalet 1399413,376 2016786,182 69,38 %

Duplex 346999,0394 701666,6667 49,45 %

Penhouse 2539121,36 2930643,714 86,64 %

Studio 1790676,119 1148818,182 155,87 %

Town house 2024817,3 1621875 124,84 %

Twin house 2311485,8 2085000 110,86 %

Since different property kinds have distinct means, it is necessary to compare the
price fluctuations between these different property types using the coefficient of variation
(CV). The CV of studio is higher than that of other sorts of properties, indicating that
town house prices fluctuate more than those of other forms of dwellings, according to a
comparison of CV data for other types of properties.

 Method of data analysis

Data outliers can be identified using box plots. Summary statistics and histograms
are used to examine quantitative variables, whereas frequency tables and a pie/bar chart
are used to examine qualitative ones. The histogram will help those with numerical data
see the spread of the data and the skewness trend of the graph. Indicators such as
frequency tables, pie charts, and bar graphs allow readers to easily compare and contrast
the prevalence and distribution of data. The price-quantity link was examined using
correlation coefficients and scatter plots. Mean, standard deviation, and coefficient of
variation are determined, then compared to analyze the correlation between price and
qualitative features.

Continuous data and quantitative testing can be visualized using histograms, with
the proportion of observations represented by the length of the bars. Multiple data
categories can be represented with varying heights of bars in bar charts. To visualize how
several factors contribute to the whole, a pie chart is used. You may easily compare
qualitative values using either a pie chart or a bar chart.

Part C

Furnished variance T-Test

Independent Samples Test


Levene's Test
for Equality of
Variances t-test for Equality of Means
95% Confidence Interval
Sig. (2- Mean Std. Error of the Difference
F Sig. t df tailed) Difference Difference Lower Upper
The Equal variances ,982 ,322 -,762 367 ,446 - 152677,18 - 183871,879
price of assumed 116360,015 78637056 416591,91 247651500
property 952758720 50 11531689
40
Equal variances -,762 364,182 ,447 - 152766,47 - 184055,140
not assumed 116360,015 01335452 416775,17 352942700
952758720 30 22584601
40
Test for equality variance

Two hypotheses are conceivable:


H 0: σ²(furnished) = σ²(unfurnished)

H 1: σ²(furnished) ≠ σ²(unfurnished)

The test's P value is 0.322, which is higher than = 0,05, hence H1 is rejected and
H0 is true. As a result, the data with assumed equal variances will be used in the test for
mean equality.

Test for equality of mean

Two hypotheses are conceivable:

H 0: μ(furnished) = μ(unfurnished)

H 1: μ(furnished) ≠ μ(unfurnished)

Do not reject H0, which revealed that there is no difference in the average prices
of furnished and unfurnished properties, because the P value of the test is0.446, which is
higher than = 0,05.

Level-group T-Test
Independent Samples Test
Levene's Test
for Equality of
Variances t-test for Equality of Means
95% Confidence Interval of
Sig. (2- Mean Std. Error the Difference
F Sig. t df tailed) Difference Difference Lower Upper
The Equal ,080 ,778 2,202 367 ,028 430963,098 195755,007 46020,8697 815905,32632
price of variances 055501470 516202600 90035180 0967800
property assumed
Equal 2,196 99,296 ,030 430963,098 196227,638 41619,2386 820306,95743
variances not 055501470 855224850 72550480 8452400
assumed
Test for equality variance

Two hypotheses are conceivable:


H 0: σ²(Level_group1) = σ²(Level_group2)

H 1: σ²(Level_group1) ≠ σ²(Level_group2)

Do not reject H0 since the P value of the test is larger than = 0,05 at 0.778. The
data with equal variances assumed in the preceding table will be used for the mean
equality test.

Test for equality of mean

Two hypotheses are conceivable:

H 0: μ(Level_group1) = μ(Level_group2)

H 1: μ(Level_group1) ≠ μ(Level_group2)

The P value of the test is 0.028, which is smaller than α = 0,05, therefore we
might reject H0. Therefore, we can see that the average house price of houses on the
ground, first and second floors is different from other types of houses.

Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 ,220a ,048 ,035 1439537,359707
030700
a. Predictors: (Constant), Furnished_Dummy, Level_Group, Number of
bathrooms, The Area of the property by m2, Number of bedrooms

The variables furnished, bedrooms, Level group, space, and bathrooms, with a R
Square value of0.048, are responsible for 4,8% of the variance in the variable price.

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 3815206287804 5 7630412575609, 3,682 ,003b
5,875 175
Residual 7522332150272 363 2072267809992,
01,000 289
Total 7903852779052 368
46,900
a. Dependent Variable: The price of property
b. Predictors: (Constant), Furnished_Dummy, Level_Group, Number of bathrooms, The Area of the
property by m2, Number of bedrooms

Test for overall significant

Two hypotheses are conceivable:

H 0 : R2= 0

H 1: R2> 0

The model is overall significant with 5 variances since the P value of the test
is0.003, which is less than =0,05, and H0 is rejected as accurate.

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 1517634,695 395007,401 3,842 ,000
Number of bedrooms -74239,165 185530,937 -,035 -,400 ,689
Number of bathrooms 276822,145 198570,951 ,088 1,394 ,164
The Area of the property by 6958,531 3753,247 ,153 1,854 ,065
m2
Level_Group -452414,877 193518,848 -,120 -2,338 ,020
Furnished_Dummy -86331,475 150458,004 -,029 -,574 ,566
a. Dependent Variable: The price of property

^
Price=¿ ¿1517634,695+276822,145*Bathrooms-74239,165*Bedrooms+6958,531*Area-
452414,877*Level_Group- 86331,475*Furnished

The cost increases by $276822,145 on average when there are one more bathroom. When
the number of bedrooms is increased by one, the cost drops by an average of $74239,165. The
price will increase by $6958,531 for every 1 m2 that the property's size increases by. In
comparison to other properties (Level group 1), properties on the third level or higher (Level
group 2) cost about $452414,887 less. Properties that are furnished will cost about $86331,475
less than those that are unfurnished.

Test for coefficient

H0: βi = 0

H1: βi ≠ 0

 Bathrooms
P value = 0,164 > α = 0,1 => Do not reject H0

 Bedrooms
P value = 0,689 > α = 0,1 => Do not reject H0

 Area
P value = 0,065 < α = 0,1 => Reject H0

 Level_group
P value = 0,020 < α = 0,1 => Reject H0

 Furnished
P value = 0,566 > α = 0,1 => Do not reject H0

The aforementioned figures demonstrate that H1 is true in terms of the level and
area variables, indicating that the level of homes and the size of the properties do have an
effect on their prices. There is insufficient data to reject H0 because the P values for the
number of bedrooms, bathrooms, and furnished variables are larger than = 0,1.

You might also like