Professional Documents
Culture Documents
AAI Report Mini Project HarshSingh
AAI Report Mini Project HarshSingh
Submitted by
USN Name
1BI20AI014 Harsh Singh
Certificate
USN Name
1BI20AI014 Harsh Singh
2.
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the completion of a task would be incomplete
without crediting the people who made it possible, whose constant guidance and
encouragement crowned the efforts with success.
We would like to express our thanks to the Principal Dr. Aswath M.U. for his
encouragement that motivates us for the successful completion of mini project work.
It gives us immense pleasure to thank Dr. Jyothi D.G. Professor & Head, Department of
Artificial Intelligence & Machine Learning for her constant support and encouragement.
We would like to express our deepest gratitude to our mini project guide Mr. Manjunatha
P.B for his constant support and guidance throughout the Mini Project work.
We are very much pleasured to express our sincere gratitude to the friendly co-operation
showed by all the staff members of Artificial Intelligence and Machine Learning
Department, BIT.
Last but not the least, we would here by acknowledge and thank our friends and family who
have been our source of inspiration always instrumental in successful completion of the
Project work.
Date: 01-01-2023
Place: Bengaluru
Harsh Singh
i
ABSTRACT
The novelty of this project lies not only in its predictive analytics but also in the adaptive
nature of the model, designed to respond dynamically to the evolving complexities of urban
dynamics. Unlike static prediction models, this approach provides a forward-looking
strategy, enabling urban governance to stay ahead of emerging challenges in property
maintenance compliance.
ii
INDEX
Acknowledgement .................................................................................. i
Abstract ................................................................................................. ii
List of figures ....................................................................................... iii
1. Introduction ........................................................................................ 1
References
LIST OF FIGURES
6.2 19
Getting to know about the dataset
6.3 Plotting out features 20
iii
CHAPTER – 1
INTRODUCTION
Chapter 1
INTRODUCTION
Urban governance is undergoing a paradigm shift with the advent of advanced technologies,
presenting unprecedented opportunities to enhance decision-making processes and address
complex challenges. This project delves into the domain of machine learning to explore its
potential in forecasting property maintenance fine compliance in urban environments, with a
specific emphasis on the city of Detroit. By leveraging predictive analytics, the research aims
to revolutionize traditional enforcement mechanisms, offering a forward-looking approach to
urban governance.
In recent years, the rapid advancements in machine learning algorithms and data analytics have
paved the way for transformative applications in various sectors, including urban governance.
The integration of technology has become pivotal in fostering efficiency and innovation within
municipal administrations. The city of Detroit, emblematic of urban revitalization efforts,
serves as a compelling case study for the intersection of machine learning and urban
governance. With a rich history and a diverse set of challenges,
This project not only seeks to explore the technical aspects of machine learning but also aims
to contribute to a broader discourse on the evolving nature of governance in urban settings. By
shifting the focus from reactive enforcement measures to proactive, data-driven forecasting,
the study endeavors to lay the groundwork for a more adaptive and responsive urban
governance model. The implications of successful implementation in Detroit could have far-
reaching effects, setting a precedent for other cities grappling with similar challenges. As
technology continues to evolve, the findings from this research endeavor may offer valuable
insights into how cities can leverage machine learning to optimize decision-making processes,
foster community engagement, and ultimately create more resilient and sustainable urban
environments.
.
LITERATURE REVIEW
Paper 1:
Paper 3:
[3] Title: “Factors affecting residential land prices” by Doa'a Majed Al Totanji, Md.
Sayuti Bin Ishak (2023)
The literature survey by Doa'a Majed Al Totanji and Md. Sayuti Bin Ishak investigates
factors influencing residential land prices globally. The study, spanning 2000 to 2022,
reviews 185 publications, categorizing factors into social, economic, environmental, and
governmental aspects. Real estate's socioeconomic role, the supply-demand dynamic, and
the lack of consensus on factors are highlighted. The analysis reveals a surge in publications,
with the United States, China, Australia, Malaysia, and Canada leading.
The proposed system, "Urban Governance Forecasting with Machine Learning," leverages
advanced algorithms to revolutionize property fine compliance. By analyzing
comprehensive datasets encompassing building permits and citizen complaints, the model
optimizes enforcement strategies, guiding effective resource allocation. Its adaptive nature
accommodates dynamic urban dynamics, moving beyond static predictions. This forward-
looking approach not only ensures efficient fine enforcement but also establishes a
groundbreaking precedent for data-driven urban governance.
4.1 ALGORITHM
• Data Source: Derived from a comprehensive property fine database, the dataset
encompasses crucial attributes influencing fine compliance. Parameters include
historical fine records, property characteristics, neighborhood details, and citizen
complaints.
• Data Quality Assurance: Stringent validation processes ensure data integrity and
accuracy, addressing anomalies, inconsistencies, or missing values. Data is sourced
from reliable municipal records, enforcement agencies, and community feedback.
• Data Cleaning: Impeccable data hygiene is maintained through techniques such as
imputation, duplicate elimination, and error rectification. Specialized methods
handle outliers and skewed distributions, ensuring the dataset's robustness.
• Data Transformation: Numeric and categorical variables undergo normalization
and encoding, respectively. Feature engineering introduces relevant indicators,
enhancing the dataset's predictive richness.
• Model Suitability: Extensive Thorough research identifies models suitable for
property fine prediction. Classification algorithms, including Decision Trees,
Random Forests, Support Vector Machines, k-Nearest Neighbors, Logistic
Regression, and Stochastic
• Alignment with Data Characteristics: Model assumptions are rigorously
evaluated for compatibility with property fine data, ensuring accurate predictions.
• Model Training: Selected models are meticulously trained on the dataset, leveraging
informative features to predict fine compliance as a categorical outcome .
• Model Evaluation: Rigorous evaluation on the test dataset employs key metrics such
as accuracy, precision, recall, F1-score, and confusion matrix analysis. Cross-validation
techniques fortify models against overfitting or underfitting.
Performance Analysis: Evaluation metrics provide insights into model strengths and
areas for improvement. Feature interpretability aids in deciphering factors influencing
fine compliance.
Iterative Refinement: Informed by evaluation insights, iterative refinement begins,
involving feature adjustments, model experimentation, and hyperparameter fine-tuning
for optimal predictive performance
4.2 Dataflow
Data Preprocessing: Preprocessing for property fine prediction includes handling missing
values, addressing outliers, and transforming variables for model optimization.
Techniques like imputation, outlier removal, and feature transformation are applied.
Additionally, the dataset is encoded to prepare it for machine learning model training.
Splitting the Dataset: The dataset is split into training and testing sets to facilitate model
training and evaluation. Standardization ensures consistent feature scaling, and the dataset
is divided into input features (X_train, X_test) and the target variable (y_train, y_test).
Predict Result: Each classifier is trained on the training set and evaluated on the testing
set. Predictions are made, and accuracy scores are computed for each classifier.
Hyperparameter tuning may be performed for optimal performance. The chosen model is
trained on the full dataset, and predictions are made on the test set. Performance metrics,
including accuracy, classification reports, and confusion matrices, provide insights into the
model's effectiveness in predicting property fine compliance.
Chapter 5
IMPLEMENTATION
pip install feature-engine
import pandas as pd
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 999)
trainset = pd.read_csv('train.csv', encoding = 'ISO-8859-1')
testset = pd.read_csv('test.csv', encoding = 'ISO-8859-1')
address = pd.read_csv('addresses.csv')
latlot = pd.read_csv('latlons.csv')
for i in range(10):
print('sample ', str(i))
print(address.sample(2), '\n')
print('number of repeated ticket_id exist both in train set and test set:
{}'.format(len(set(trainset.ticket_id).intersection(
testset.ticket_id))))
print('number of test set ticket_id exist in address:
{}'.format(len(set(testset.ticket_id).intersection(
address.ticket_id))))
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
state = 35
dumclfs = {}
for strategy in strategies:
clf = DummyClassifier(strategy=strategy, random_state = state)
clf.fit(X_train, y_train)
y_preds = clf.predict(X_test)
auc = roc_auc_score(y_test, y_preds)
dumclfs[strategy]=auc
rf = RandomForestClassifier().fit(X_train, y_train)
rf_predicted = rf.predict(X_test)
auc = roc_auc_score(y_test, rf_predicted)
dumclfs['rf']=auc
import re
def violation_code_extract(code):
try:
result = re.match(r'^(\d+-\d+)', code).group()
except:
result = None
return result
violation = train_featI_prep.violation_code.apply(violation_code_extract)
violation.value_counts()
for dataset in [train_featI_prep, test_featI]:
dataset['simplified_code'] = dataset['violation_code'].apply(lambda code: code.split('-
')[0] if '-' in code else code)
print(dataset.groupby('simplified_code')['judgment_amount'].agg(['min', 'mean',
'median', 'max', 'std', 'count']))
for dataset in [train_featI_prep, test_featI]:
dataset['simplified_code'] = dataset['simplified_code'].apply(lambda x: x if x in ['9',
'22', '61'] else '0')
print(dataset.groupby('simplified_code')['judgment_amount'].agg(['min', 'mean',
'median', 'max', 'std', 'count']))
print('number of missing values in training set: \n{}'
.format(train_featII.isnull().sum(axis=0)))
# Encode 'agency_name'
lbe_agency = LabelEncoder().fit(pd.concat([X_train['agency_name'],
X_test['agency_name']]))
X_train['agency_name'] = lbe_agency.transform(X_train['agency_name'])
X_test['agency_name'] = lbe_agency.transform(X_test['agency_name'])
# Encode 'simplified_code'
lbe_violation = LabelEncoder().fit(pd.concat([X_train['simplified_code'],
X_test['simplified_code']]))
X_train['simplified_code'] = lbe_violation.transform(X_train['simplified_code'])
X_test['simplified_code'] = lbe_violation.transform(X_test['simplified_code'])
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
X_train_ohe = preprocessor.fit_transform(X_train)
# Transform the test data
X_test_ohe = preprocessor.transform(X_test)
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import GridSearchCV
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import roc_auc_score
regr_rf = RandomForestRegressor()
grid_values = {'n_estimators': [10, 100], 'max_depth': [None, 30]}
grid_clf_auc = GridSearchCV(regr_rf, param_grid=grid_values, scoring='roc_auc')
grid_clf_auc.fit(X_ohe, y)
print('Grid best parameter (max. AUC): ', grid_clf_auc.best_params_)
Print('Grid best score (AUC): ', grid_clf_auc.best_score_)
Finding out the accurate feature of interest by plotting various plots and finding their
correlation
Title: " Real Estate Valuation and Market Analysis" by Joshua Kahr, John Wiley &
Sons
Title: " Urban Property Management" by Robert C. Kyle, Floyd M. Baird, Urban
Land Institute
Title: " Property Investment Appraisal" by Peter Wyatt, Neil Crosby, John Wiley &
Sons