Professional Documents
Culture Documents
Statistics Spreadsheet Intermediate
Statistics Spreadsheet Intermediate
Statistics Spreadsheet Intermediate
SPREADSHEET
A journey from data preparation & cleaning until the
Exploratory Descriptive Analytics (EDA) involving statistics
of a property listing dataset in Kuala Lumpur, Malaysia
LILIEK DARMAWAN TH
DATA OVERVIEW
TABLE OF Short overview about the dataset
LINEAR REGRESSION
Further analysis of price prediction using linear regression method.
DATA DATA OVERVIEW
The dataset has 5000 property listings within in Kuala Lumpur which the
DATA CLEANING
The dataset contains of 5000 data of property listing, however some
data has not enough information to be analyzed. Before we could
conduct a further analyses, we need to do some cleaning and
preparation for the dataset, including:
Removal of missing and irrelevant value
Handling of missing value
Repair and conversion to proper data type
Removal of duplicates value
DATA REMOVAL RELIABILITY
TOLERANCE = 10%
36 100 164 92 76
DATA SELECTION:
Since the missing value occurs at 'Area' in Cheras with
various types of condominium, the sample can be limited
within these conditions:
City: Cheras
Type: Condominium
Remove some outliers
AREA =
(1.27 * PRICE / 1000 + 453)
PRICE DISTRIBUTION
Sum RM 9,006,835,892
Price Distribution is skewed positively
Count 4768 IQR RM 1,700,000
Upper Fence RM 4,950,000
Lower Fence -RM 1,850,000
AREA DISTRIBUTION
> 5.135 sq.ft (Upper Fence) Median 1,600 sq. ft. 30 1170 sq. ft.
Considered as outlier but may not be removed
Mode 1,650 sq. ft. 40 1389 sq. ft.
(to supress the number of data removal)
Standard Deviation 1,824 sq. ft. 50 1600 sq. ft.
PRICE PREDICTION
The next study case will involve a correlation matrix and linear
regression to find a prediction model for price using given variables.
However, we need to check on some conditions to make sure the
regression model fits to the actual condition, which will be shown in the
some following slides.
TOP 3
BASED ON
AVG PRICE
BUKIT KIARA
Price Rooms Maid Room Bathrooms Car Parks
Mean RM 4,947,432 Mean 4.75 Mean 0.5 Mean 5.5 Mean 3.5
Standard Error RM 180,753 Standard Error 0.25 Standard Error 0.2886751346 Standard Error 0.2886751346 Standard Error 0.2886751346
Standard Deviation RM 361,506 Standard Deviation 0.5 Standard Deviation 0.5773502692 Standard Deviation 0.5773502692 Standard Deviation 0.5773502692
Sample Variance 130686423860 Sample Variance 0.25 Sample Variance 0.3333333333 Sample Variance 0.3333333333 Sample Variance 0.3333333333
Mean RM 4,509,822 Mean 4.210526316 Mean 0.7763157895 Mean 4.822368421 Mean 2.407894737
Standard Error RM 248,091 Standard Error 0.1471915485 Standard Error 0.04398002535 Standard Error 0.1751616227 Standard Error 0.1849032189
Standard Deviation RM 3,058,675 Standard Deviation 1.814699285 Standard Deviation 0.5422221683 Standard Deviation 2.15953752 Standard Deviation 2.279639983
Sample Variance 9355490755550 Sample Variance 3.293133496 Sample Variance 0.2940048797 Sample Variance 4.6636023 Sample Variance 5.196758452
Kurtosis -0.3137257113 Kurtosis -0.1693858052 Kurtosis -0.1867490677 Kurtosis 0.1882247896 Kurtosis 0.327117642
Skewness 0.8040943885 Skewness -0.03534967296 Skewness -0.1055613426 Skewness 0.5684633676 Skewness 0.8686508788
Sum RM 685,493,007 Sum 640 Sum 118 Sum 733 Sum 366
Count 152 Count 152 Count 152 Count 152 Count 152
FEDERAL HILLS
Price Rooms Maid Room Bathrooms Car Parks
Standard Error RM 649,338 Standard Error 0.5 Standard Error 0 Standard Error 0 Standard Error 0
Standard Deviation RM 1,298,676 Standard Deviation 1 Standard Deviation 0 Standard Deviation 0 Standard Deviation 0
Sample Variance 1686558333333 Sample Variance 1 Sample Variance 0 Sample Variance 0 Sample Variance 0
# # Maid # #
Location Avg of Price # Props Type of Props
Rooms rooms Bathrooms Car Parks
Among the top 3 property listings based on average price, Observing from the descriptive statistics, the statistics for
Damansara Height has more varied choice due to type of both Bukit Kiara and Federal Hills are easily distracted if
property, # of rooms, maid room, bathroom, and also the car there is any new property listed to the location due to the less
parks; while Bukit Kiara and Federal Hills are limited to 4 number of property in these 2 locations.
buildings only.
STUDY ABOUT
PRICE IN
MONT KIARA
CORRELATION MATRIX
AREA = 0.91
ROOMS = 0.71
BATHROOM = 0.70
MAID ROOM = 0.47
These 4 aspects have moderate to strong correlation
which could possibly affect the price in Mont Kiara
location. While the availability of car park might not
affect the price too much.
Regression Statistics
01 Simultaneous (F-test)
Multiple R 0.7687
R Square 0.5909
02 Partial (t-test)
Adjusted R Square 0.5881
Observations 592
Reject null hypotheses, H0, if:
Alpha Threshold > 5%
SIGNIFICANCE TEST: SIMULTANEOUS
df SS MS F Significance F
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95,0% Upper 95,0%
Maid Room 160801.5799 54300.74872 2.96131423 0.0031871308 54154.17438 267448.9854 54154.17438 267448.9854
Car Parks 14617.51825 19260.46205 0.758939127 0.4481936057 -23210.28993 52445.32644 -23210.28993 52445.32644
INTUITIVELY..
The price of 1 room is RM 429.562
The price of 1 maid room is RM 160.801
The price of 1 bathroom is RM 201.482
The price of 1 car park is RM 14.617
cc urate to the
A
given data
ANY FURTHER DISCUSSION?
LET'S CONNECT!
LILIEK DARMAWAN TH
www.linkedin.com/in/liek-darmawan