Professional Documents
Culture Documents
Group 55 Final Report
Group 55 Final Report
I. Introduction
A. Problem Statement
The primary challenge is to decipher how various factors such as the size of the house, the
number of bedrooms and bathrooms, and geographical location (city, state, zip code) affect
house prices in the United States.
Economic Impact: The housing market is a cornerstone of the U.S. economy, and fluctuations in
house prices reflect and influence its health.
Decision-Making: Both buyers and sellers require insights into these factors for informed
decision-making.
Policy Formulation: This understanding is crucial for policymakers to craft effective housing
policies.
Real Estate Valuation: Professionals in the real estate industry need to accurately assess
property values for various purposes.
II. Methodology
A. Data Collection
Data was meticulously collected from Realtor.com, comprising sale listings that detailed house
size, number of bedrooms and bathrooms, and precise location data.
Python Utilization: For initial data preparation and organization, Python served as the primary
tool, streamlining the raw data into an analyzable format.
SQL Application: SQL was employed for its robust data structuring capabilities, transforming raw
data into structured datasets.
Pandas for In-Depth Analysis: The Pandas library was instrumental in conducting a detailed
examination of the data, allowing for the identification of significant patterns and insights.
B. Correlation Insights
The analysis revealed a robust relationship between house prices and attributes such as size,
the number of bedrooms, and location.
C. Geographical Trends
A clear variation in housing prices was observed across different regions, with significant
differences noted not just between states but also within cities.
D. Market Insights
The analysis led to the discovery of unique market trends and patterns that provide deeper
insights into the factors influencing house prices.
B. Performance Metrics
Mean Squared Error (MSE): At 1,412,315.19, the MSE quantifies the average deviation
between the model's predictions and actual prices.
R-squared: Standing at 0.2458 (24.58%), this metric signifies that approximately one quarter of
the variability in housing prices is accounted for by the model.
C. Analysis
The plot illustrates that the model yields a reasonable prediction for numerous listings (where
blue and red dots overlap). Nonetheless, there are notable instances where the predictions
diverge significantly from the actual prices, as evidenced by the isolated blue dots. This
highlights areas for potential refinement in the model.
V. Conclusion
The study provides a foundational understanding of the factors influencing house prices in the
U.S. The developed model, while a significant first step, indicates that additional variables and
refinement may be required to enhance its predictive accuracy and reliability.
VI. Recommendations
Further research is recommended to integrate additional variables and employ more advanced
modeling techniques. Continuous model refinement is essential to improve predictive
performance and the utility of the insights provided.
(Include detailed charts, tables, and other relevant data visualizations here.)