Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

POQ-531

BUSINESS STATISTICS-I
NIIT UNIVERSITY, NEEMRANA, RAJASTHAN.
MBA ISDE 2021-2023

NEW HOUSE SALES(US) – TIME SERIES ANALYSIS


PROJECT REPORT
PARTCIPANTS
ARNAV BORTHAKUR (MB21ISDE285)
D S V CHARAN HARSHA(MB21ISDE281)
DIVYATEJA DADI (MB21ISDE308)
HIMANSHU BISHT(MB21ISDE313)
PREFACE

The motivation for this study arose from my desire to have a better understanding of timeseries data
processing. The most significant and required part of data analysis is forecasting and predicting. The
results of a time series data analysis will assist us in optimising a number of industrial parameters that
change over time. In the future, We intend to delve deeply into forecasting principles and gain a
thorough understanding of them. The study of time series data from US CENSUS REPORTS is the
focus of this research article. The data covers the sales of new single-family homes in the United States
from 1963 through 2021.
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of Business Statistics-1 project undertaken
during the 1st term. We owe special debt of gratitude to our faculty Mrs. Keerti Jain for her constant
support and work throughout the course of our work. We would like to acknowledge and thank our
group members for their sincere advice and contribution throughout the preparation of the project. At
last, but not the least we would like to thank our college management for providing us the good
environment to do this report.

Without a strong support system, we would not have been able to accomplish my current level of
achievement. First and foremost, we want to thank my parents for their unwavering love and support.
Second, my committee members, who have all offered helpful suggestions and direction throughout
the study process. We appreciate everyone's constant support.
CONTENTS

I. INTRODUCTION

II. LITERATURE REVIEW

III. RESEARCH METHODOLOGY

IV. DATA ANALYSIS

V. CONCLUSIONS

VI. FUTURE SCOPE

VII. REFERENCES
INTRODUCTION

The goal of the Survey of Construction (SOC) is to provide national and regional statistics on new
single-family and multifamily housing unit starts and completions, as well as statistics on new single-
family house sales in the United States. This survey is primarily funded by the Department of Housing
and Urban Development. The SOC also publishes data on the characteristics of new privately held
residential constructions in the US. New single-family houses completed, new multifamily housing
completed, new single-family houses sold, and new contractor-built houses started are all included in
the data.

We chose this data to better understand the trends in new home sales in the United States from 1963 to
the present. The analysis of this type of data yields numerous insights, such as whether sales data is
seasonal, trends in new houses under construction that are sold, annual sold rate of houses, and houses
that are retained for sale for a specified time period.

The primary perspective on the data taught us that there are numerous characteristics to consider while
analysing the data, such as the categorical data presented and its subcategories. It's also worth checking
if the dataset has any altered values.

Certain numbers in the data have not been recorded since the beginning of the time frame of the data,
and their inclusion in the middle will help us better comprehend the flow of new house sales in the
United States. Even those data points were taken into account when the data was analysed.

This dataset contains around 25,000 records, each with eight different category variables and values
linked with each record in order of date.

Let us move on and delve a little further into the data to gain a better grasp of new home sales in the
United States.
LITERATURE REVIEW
Reviews of rises in real estate prices make front-page news, but the methods used to measure these
price movements are quite crude. The most widely reported price trends for residential properties,
as published by the National Association of Realtors, are restricted to the average price of existing
single-family housing, as reported by member realtor transactions in several metropolitan areas
(Case and Quigley, 1991). These housing sales figures are not standardised for any of the dwellings
purchased and sold. Standardization is minimal for commercial properties; sale or rental prices are
reported per square foot based on survey data gathered by financial service institutions and
brokerage firms.

Case and Quigley, 19 91, It is widely accepted that when inferring price trends, it is necessary to
account for the varying characteristics of properties statistically (see Greenlees (1982) for a
discussion), and over the last few decades, a variety of hedonic techniques have been suggested to
account for the crucial non-temporal indicators of price variation, Griliches (1971).

Bailey and Nourse, 1963; Giaccotoo and Clapp, 1992; Quiley and Case, 1991; Shiller and Case, 1989;
Wallace and Meese, 1991; Megbolugbe said that there are many extensive studies to measure the
aggregate real estate prices. Megbolugbe and Case, 1997; Ling and Gatzlaff, 1994; Spiegel and
Goetzmann, 1997 said that instead of that they must make more focus on collecting and analysing
the measurements or data points of price movements or fluctuations among the local markets.
Schwann, 1998 said that we should use or suggested time-series methodology for price
measurement in thin markets Because this technology is more frugal than previous methods, it has
the potential to allow for more near-market subdivisions. The paucity of degrees of freedom is the
main estimate challenge for small-area price indexes. All available methods for estimating
transaction-based indexes necessitate the creation of many time-indexed variables and regressors
Data. For each period in the sample, there must be at least one. These variables represent the
average fluctuations in property or attribute prices over a specific period. These averages become
incorrect when the number of transactions per period decreases. Estimating a temporally
consolidated price index is a common "solution" to the problem. Switching to a half yearly or yearly
index from a quarterly, for example. This approach has the drawback of smoothing the index over
time. Smoothing's drawbacks have been examined in the asset allocation literature, particularly by
Geltner (1991). Another drawback of temporal consolidation is that the index becomes less current,
reducing its usefulness as a trade control tool. He proposed a time-series-based alternative that
allows for the estimate of an established estimate.

A time-series price index connects current transactions to previous transactions, increasing the
number of similar assets on which the index's value is based. That is, in calculating the value of the
index for a given period, prior transactions are substituted for current ones. This understanding can
be found in Quan and Quigley's work (1991). They demonstrate that transactions introduce noise
into the underlying series data and that the best way to update the market price is to weight past
and present market data.

Case and Mayer, 1994 said that recent research has revealed intriguing and disturbing patterns of
house price growth and decrease in volatile markets. Smith and Tesarek (1991) show evidence that
high-quality home values in the Houston area gained quicker during the boom, sank further during
the collapse, and rebounded faster during the subsequent expansion than 'low-quality' properties.
According to Mayer (1993), a parallel pattern of heightened volatility for high-priced residences
emerged in four locations during the 1970s and the mid-1980s. Case and Shiller (1994) find
comparable patterns in Los Angeles but the opposite in Boston, where lower-tier properties gained
the most during the bubble and dropped the most during the bust.

Most studies of informational efficiency in housing markets rely on correlational analyses, which do
not allow for interpretation (Herath and Maier, 2015), a problem that can be solved by employing
research studies (Salzman and Zwinkels, 2017). A small number of experimental studies on
behavioural real estate found consistent evidence for non-normative behaviour of real estate
market actors (Diaz, 1990), which can be attributed to cognitive biases such as herd behaviour
(Seiler et al., 2014), funds illusion (Hansz and Diaz, 1997, 2001). (Runeson and Raftery, 1998) and
aligning (Diaz and Hansz, 1997, 2001).
RESEARCH METHODOLOGY

The data has been extracted from US CENSUS official website. i.e., Reference:
https://www.census.gov/econ/currentdata/datasets/index

Going forward, We would like to give glimpse of a process involved in preparing this research on
New House Sales in US Region dated from 1963 to 2021.

METHODOLOGY:

Data Extraction Data Preprocessing Descriptive Analysis

Time Series Analysis Visualisation Insights

Data Extraction: The data has been extracted from US CENSUS official website. i.e., Reference:
https://www.census.gov/econ/currentdata/datasets/index .

Data Pre-processing: The data extracted has been loaded as a CSV file and been examined to create
a right format of data that can be loaded into R for statistical analysis.
Later to that the data has been divided into the categories and sub categories for the purpose of
descriptive analysis on R.
Descriptive Analysis: From the descriptive analysis, it is have found that the data is a time series data
with a good number of records. Data has been bifurcated into various available categories and the
prime variables statistic summary has been found.

Time Series Analysis: A time series data should be treated in a bit different way for the better
analysis. Time series functions have been used to analyse the time series.

Visualisation: We have used multiple packages from R to plot the outcomes of the analysis.

Insights: Insights have been noted from the outcomes and visualized data.
DATA ANALYSIS
Data has been provided in the below format. The below data is aligned according to the data
dictionary.

Data Dictionary:

There are three main categories namely SOLD, ASOLD and FORSALE which again has
subcategories as mentioned below.
The each of the above subcategories talks about different data types. If dt_unit is K, that says no of
houses in thousands. If it is DOL, that means price in thousands of dollars and MO refers to the
number of months.

Data has been bifurcated accordingly and will be discussed in further topics.

The above list talks about the one of the categories provided in the dataset in PCT(Percentage). We
have not used a much of it as the focus on forecasting techniques was not that high.

In the dataset there is TOTAL in dt_idx and for which the data val has been bifurcated among various
US regions.
The above image illustrates about the Date factor of the dataset starting from Jan 1963 to Sep 2021.
Certain categories have been introduced later years like 1973, 1999 and they were analysed
accordingly.

For the easier computation od data on R. The data has been converted to the below format on CSV
itself and then loaded into R.

From the above vie of the dataset the most important variable is val which holds different datatypes as
explained in earlier instance.

This is how the data looks like on R after loading into a tibble.
The above illustrates the shape and info of the entire dataset.

From the entire dataset, val variable speaks about various data. So, below is the summary of val
variable before bifurcating the data.

Box plots have drawn to check the outliers and they are as follows:
SOLD – COMPLETED

Statistical Description of the SOLD-COMPLETED is as follows:

The data frames have been converted to a time series objects for a better visualisation of the data.
The overall sold of completed houses were good between 1970-1980 and 2000-2010 and again were
showing a good behaviour near about 2020.
Above is the visualisation of the data for a span of 10 years from 2010 to 2021

SOLD-NOT STARTED
Here is houses sold in not started case was good between 1990-2000 and dropped later after that and
started rising towards 2020.

Above shows the spike in data for last ten years.


SOLD-UNDER CONSTRUCTION
The sales of under construction homes was having a rising spike between 1990 and 2010 and dropped
drastically which again rose around 2020.

Last ten years data of sales of under construction homes.

SOLD-MEDIAN PRICE
Median price of sales was rising although 1963 to 2020.
Last ten years median price data.

SOLD AVERAGE PRICE


Similarly Average price was increasing although 1963 to 2020.
Last ten years data of average price of sold houses.

FORSALE-COMPLETED
More number of houses were kept for sale in between 2000 and 2010.
Previous ten years data looks like above.

FORSALE-UNDER CONSTRUCTION
Under construction homes were kept for sale between 2000 and 2010.
Last ten years data looks like above.

FORSALE-NOT STARTED
Houses which were not started came to sales in a high number between 2000 to 2010.
Previous ten years data looks like this.

FORSALE-MEDIAN MONTHS
No of median months around 2010 was high.

Previous ten years data looks like above.

FORSALE-MONTHLY SUPPLY
No of months supply looks like uniform most of the time and has certain peaks in between.
Previous ten years data looks like above.

FORSALE-COMPLETED-ADJ
The adjusted value of houses for sale was high during 1980 and 1985.

FORSALE-UNDER CONSTRUCTION-ADJ
Number of houses kept for sale between 2005 and 2010 was high.
Previous ten years data looks like the above.
FORSALE-NOT STARTED-ADJ
Houses whose construction wasn’t started were kept for sales and was high after 2020.

Previous ten years data looks like this.


FORSALE- MONTHS SUPPLY-ADJ

No of months supply of adjusted values was still uniform and has certain peaks during 1980 and 2010.

Previous ten years data looks like above.


ANNUAL SOLD RATE-COMPLETED-ADJ
Annual sold rate for completed was good between 2005 and 2010 and this will be adjusted value as
per the dataset.

Last ten years annual sold rate was having the above flow.
ANNUAL SOLD RATE-NOT STARTED-ADJ
Annual sold rate of not started houses was again good between 2005 and 2010 and had a rise after
2020.
Previous ten years data.

ANNUAL SOLD RATE-UNDER CONSTRUCTION-ADJ


Annual sold rate of under construction homes adjusted values was again good between 2005 and
2010.
Previous ten years data looks like above.

The above are the bifurcated datasets from the original dataset.

The total no of houses sold, total no of houses for sale and total annual rate of sale have been plotted
and they are as shown below.

SOLD FORSALE
ANNUAL SOLD RATE

Comparison plots have been plotted between different combinations as follows:


SOLD

No of houses that construction has been completed and under construction shows similar behaviour
initially and later stage all there were exhibiting similar behaviour.
Average price and Median price of Houses shows same kind of behaviour. However, Average price is
higher than Median price which is usually true.
FORSALE
No of houses under construction is more when we compare with others.

ANNUAL RATE SOLD

Annual rate of sale for completed houses was high in comparison with others.

Correlation has been drawn between the variables.

Variables exhibit strong correlation between (Completed, Under construction), (Under construction,
not started), (Completed, Not Started).
Chi square test has been performed for same set of variables
Correlation between sold values for (completed houses, under construction houses and not started )
and median price sold as follows:

Chi Square test for same set of variables have been done.
CONCLUSIONS
After looking at the results from correlation between certain variables initially have given a thought that
each of the variable have some or other proportionality towards the other variables. But later when chi
square test has been conducted, it was pretty evident that we cannot rely on the proportionality exhibited
by the correlation between variables.

Certain Forecasting techniques like ARIMA etc can be used to forecast the sales data. But, This data is
a powerful data with a strong background and will really help a lot in forecasting the future sales.
FUTURE SCOPE
Looking at the Analysis done above we can use the analysis for forecasting using different machine
learning models. Data has been studied in various directions and certain conclusions have been drawn
above.

This data analysis will definitely help if any home sales data has to forecasted or else predicted in
similar kind of countries. Even US will be able to easily able to have a strong control on the economy
even during difficulty times.

Companies in real estate industry can actually have a good grip of managing the price based up on the
seasonality observed in the data when it is decomposed.

People who are planning in investments can also take their decisions wisely looking at the analysis
and even when forecasting is done.
REFERENCES
I. https://www.census.gov/econ/currentdata/datasets/index
II. https://media.readthedocs.org/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-
r-for-time-series.pdf
III. https://www.geeksforgeeks.org/time-series-analysis-in-r/
IV. https://scc.ms.unimelb.edu.au/resources-list/simple-r-scripts-for-analysis/r-scripts
V. https://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html
VI. https://cran.r-project.org/web/packages/TSstudio/vignettes/Plotting_Time_Series.html
VII. https://www.r-bloggers.com/2021/08/how-to-overlay-plots-in-r-quick-guide-with-
example/
VIII. https://www.r-graph-gallery.com/316-possible-inputs-for-the-dygraphs-library.html
IX. https://www.statmethods.net/advgraphs/layout.html
X. Case, B., & Quigley, J. M. (1991). The Dynamics of Real Estate Prices. The Review of
Economics and Statistics, 73(1), 50–58. https://doi.org/10.2307/2109686
XI. Schwann, G.M. A Real Estate Price Index for Thin Markets. The Journal of Real Estate
Finance and Economics 16, 269–287 (1998). https://doi.org/10.1023/A:1007719513787
XII. Martin J. Bailey, Richard F. Muth & Hugh O. Nourse (1963) A Regression Method for Real
Estate Price Index Construction, Journal of the American Statistical Association, 58:304,
933-942, DOI: 10.1080/01621459.1963.10480679
XIII. se Can, A., Megbolugbe, I. Spatial Dependence and House Price Index Construction. The
Journal of Real Estate Finance and Economics 14, 203–222 (1997).
https://doi.org/10.1023/A:1007744706720
XIV. Karl E. Case, Christopher J. Mayer, Housing price dynamics within a metropolitan area,
Regional Science and Urban Economics, Volume 26, Issues 3–4, 1996, ISSN 0166-0462,
https://doi.org/10.1016/0166-0462(95)02121-3.
XV. Ozan Isler, Terry Flew, Isil Erol, Uwe Dulleck, Market news and credibility cues improve
house price predictions: An experiment on bounded rationality in real estate, Journal of
Behavioral and Experimental Finance, 2021, https://doi.org/10.1016/j.jbef.2021.100550.
(https://www.sciencedirect.com/science/article/pii/S2214635021000940)
XVI. Smith, B.A. and W.P. Tesarek, 1991, House prices and regional real estate cycles: Market
adjustments in Houston, AREUEA Journal 19, no. 3, 396-416.
XVII. Mayer, C.J., 1993, Taxes, income distribution, and the real estate cycle: Why all houses
do not appreciate at the same rate, New England Economic Review, May-June, 39-50.
XVIII. Case, K.E. and R.J. Shiller, 1994, A decade of boom and bust in the prices of single-family
homes: Boston and Los Angeles, 1983 to 1993, New England Economic Review, March
April, 40-51.
XIX. Herath, S., Maier, G., 2015. Informational efficiency of the real estate market: A meta-
analysis. J. Econ. Res. 20 (2), 117–168.
XX. Salzman, D., Zwinkels, R.C., 2017. Behavioral real estate. J. Real Estate Lit. 25 (1), 77–
106.
XXI. Diaz, J., Hansz, J.A., 1997. How valuers use the value opinions of others. J. Prop. Valuat.
Invest. 15 (3), 256–260.
XXII. Muth, J.F., 1961. Rational expectations and the theory of price movements.
Econometrica 29 (3), 315. http://dx.doi.org/10.2307/1909635.
XXIII. Raftery, J., Runeson, G., 1998. Money illusion in consumer perception of housing
transactions. J. Prop. Valuat. Invest. 16 (2), 175–184.
XXIV. Diaz, J., Hansz, J.A., 2001. The use of reference points in valuation judgment. J. Prop. Res
18 (2), 141–148. http://dx.doi.org/10.1080/09599910110039897.
THANK YOU

You might also like