Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

© 2023 JETIR April 2023, Volume 10, Issue 4 www.jetir.

org (ISSN-2349-5162)

Exploratory Analysis on Geo-Location Data for


Accommodation
M Hari Dheeraj
P R S S V Raju N Dinesh
Department of IT
Assistant Professor Department of IT
Sagi Ramakrishnam Raju
Department of IT Sagi Ramakrishnam Raju
Engineering College Bhimavaram,
Sagi Ramakrishnam Raju Engineering College Bhimavaram,
Andhra Pradesh
Engineering College Bhimavaram, Andhra Pradesh
mukalaharidheeraj12@gmail.com
Andhra Pradesh dineshnimmala28@gmail.com
prssvraju2@gmail.com
A J N S Prasad
P Surya Revanth Department of IT
Department of IT Sagi Ramakrishnam Raju
Sagi Ramakrishnam Raju Engineering College Bhimavaram,
Engineering College Bhimavaram, Andhra Pradesh
Andhra Pradesh prasadanjuri89192@gmail.com
suryarevanth.2018@gmail.com
have in common.K-Means clustering is the best clustering
Abstract— A significant source of knowledge about locations
technique for grouping things depending on how similar they
and local human behaviour may be obtained by analyzing geo-
locational data. In the fast-paced and busy environment an
are. K-Means clustering is an algorithm for Unsupervised
average person migrates from one place to another place, it is Learning, which clusters an unlabeled dataset into distinct
difficult for that person to search, identify the best groups. The variable K represents the number of clusters that
accommodation with his preferences. A person tries to locate the algorithm will create. For instance, if K is set to 2, then the
himself in a place where his preferences and interests are more algorithm will create two distinct groups. This process is used to
and closer to his previous livelihood. So, using data group unlabeled data into different categories without any prior
visualization and clustering processes, it is possible to identify training. The algorithm takes in the unlabeled dataset as input,
amenity-rich locations within a particular radius by taking into divides it into K clusters, and repeats this process until it finds
account a variety of factors, including the overall availability of the optimal clusters. The outcome is a clustered data, which is a
restaurants, department shops, gyms, and other facilities. This depicted by the scatter plot of the objects as seen in Figure 1.
project uses K-Means Clustering to group geo-locational data
that are applied on the geo-locational data obtained from Here
Geocoding and Search API v7 (Application programming
Interface) in order to find the best places to stay for a person in
a specific city by categorizing accommodation for the people
based on their preferences on amenities, location, and budget.
This project will produce intelligent user suggestions by
analyzing geo-locational data and user preferences through our
Machine Learning model. Our goal is to locate locations with
rich, average, and low amenities and map those locations.
Keywords—K-Means Clustering, Here Geocoding and Search
API, Geo-locational Data
I. INTRODUCTION
Analyzing geo-locational data enables research into places Fig. 1 K-Means clustering example on data
and regional human behaviour. Those who travel frequently represented with a scatter plot
may find it difficult to locate the suitable area to reside. In
2021, India accounted for 1.57% of total international tourist
visits. India welcomed 17.9 million more international II. LITERATURE SURVEY
visitors in 2020 compared to 2019, an increase of 3.5%. [1] This project involves recommending hotels, gyms and
India is the eighth most visited country in the Asia-Pacific other needs for the user who has accommodated to a area
region and now holds the 22nd place worldwide. It would be newly. It is difficult for a user to find all the places in a newly
challenging for them to find a location to stay and enjoy accommodated area. So, it is easy if we recommend nearby
their vacation because India is attracting a lot of attention places. One is too tired to fix oneself a home cooked meal
from tourists. And also the people migrating to different frequently. Even if a person gets home cooked meal every day,
place may find difficulties to locate an ideal place with their it is not unusual to want to go out for a good meal every once
priorities and preferences.
in a while for social purposes. If a person moves to a new
We thus recommend which would be ideal for them based place. They already have some preferences and taste. It would
on the place they choose as well as their preferences for the save both user and the food providers a lot of benefits if they
area. Individuals who move to a new place will likely have live close to their preferred outlets. It is convenient for the
particular preferences and considerations, therefore analysis owners and provide better sales and saves time for the user.
of geo-location is used to pinpoint the optimal locations. The
[2] Throughout the last few decades, researchers have
situation would be hassle-free and time-saving if the
consumers lived close to their preferred locations. The examined the use of geo-location data to detect travel-related
methodology can be applied to any location of one's events and reasons. These research look at recurring GPS
choosing, and can vary according to user preference. To trajectories and use rules, models, and machine learning to
make the data points in each group more comparable to one identify personal movement patterns like home, work,
another than those in the other groups, the data points are shopping, and leisure. To get knowledge into the ongoing
only separated into a number of groups. In other words, the business processes, we are seeking to assess travel events as
goal is to group data items based on the characteristics they activities of various businesses in the current job.

JETIR2304595 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org f730
© 2023 JETIR April 2023, Volume 10, Issue 4 www.jetir.org (ISSN-2349-5162)
[3] In recent years, there has been a noticeable surge in According to the results, by knowing the number of clusters
immigration. When these individuals arrive in the target K-Means clustering does provide a useful solution.
nation, the majority of them are students who require long- According to our readings, the popular location-based social
term lodging. However, because he is new to the area and networks FOUR Square API and Here Geocoding & Search
does not know many relevant locations, this creates a API are used frequently to retrieve data. As a result, we used
difficulty. Therefore, this research article defines an effective the Here Geocoding & Search API, which assists in
methodology for Accommodation Recommendation through determining the preferred locations for the provided latitude
the use of K Nearest Neighbor Clustering along with and longitude and typically produces accurate results.
Artificial Neural Networks and Decision Making. The
experimental evaluation has been performed which has III. METHODOLOGY
proved the superiority of the presented technique.
The paper presents a system that utilizes geo-locational data
[4] In this paper, an enhanced clustering method called to target customers. The system uses Here Geocoding &
aggregative hierarchical clustering (HC-PE) is developed for Search API to extract geo-locational data, which is then
dynamical systems. This method is based on performance analyzed to determine the proximity of customers to various
assessment. The Pade approximation or a genetic algorithm locations. This information is useful for organizations to
(GA) are both employed. In this work, two groups of identify potential customers and to gain insights into their
findings are shown. The early sets contain simple models. In preferences and tastes. The proposed system involves
the second round of experiments, a model with several inputs applying k-means clustering to geo-locational data to
and outputs is deployed. We illustrate that HC-PE performs analyze customer proximity to different amenities. The
the best with the fewest MSEs when compared to other paper also includes a process flow diagram of the proposed
techniques (mean squared error). It scrutinizes the virtues and model, as shown in Figure 2.
properties of dynamic system models in attempt to preserve
as many of them while significantly reducing their
 Fetch the data
complexity.
 Data Cleaning & Pre-processing
[5] Exploratory analysis of geo-location data is an important
aspect of gaining insights into the patterns and trends within  Data Exploration and Visualization
the data. Several publications provide guidance on how to  Get Geo-locational Data from Here Geocoding &
conduct exploratory analysis of geo-location data. Search API
"Exploratory Data Analysis with R" by Roger D. Peng is a  Perform Clustering methods on Geo-locational data
widely cited book that provides an overview of exploratory  Depicting the clusters on a Geo-location map
data analysis techniques and their applications using the R
programming language. The book includes examples of
exploratory analysis on geo-location data. This book
introduces the key concepts and techniques of spatial data
analysis, including exploratory analysis of geo-location data.
It provides a thorough explanation of how to prepare and
analyze spatial data, and how to identify patterns and trends
within the data.

[6] In order to accurately identify different groups of


consumers based on behaviour, demographic, and other
factors, customer segmentation studies make use of a
significant quantity of customer data. To find the ideal
number of clusters, several strategies are used in customer
segmentation, however each method has limitations, such as Fig. 2 System Architecture Diagram
the DBSCAN algorithm failing in the case of changing
density clusters. The K-means approach, on the other hand,
guarantees convergence, warm-starts the centroid positions, The system contains 6 phases: Dataset collection, Data
and quickly adjusts to changes and form ideal cluster sizes. cleaning and preprocessing data exploration and
With the help of this project, marketers will be able to more visualizations, Retrieve geo-locational information through
effectively tailor their promotional, marketing, and product an API, Apply clustering strategies to the geo-locational
development strategies to various audience segments and information and Plotting the clusters on a map.
encourage people to buy the product.
[7] The "City Tour Traveler" system, is an app for the
locations URL and is based on GPS and the Internet, was A. Dataset Collection
recommended as a straightforward way to provide travel data Extract the information from the Here Geocoding & Search
and details to mobile applicants. The best way to see the city API. The gathered information must be in the form of CSV
will be made possible by the Travel App's effective design, files, since this data will be utilized to build groups with the
which includes precise instructions, locations, and models. help of clustering methods. The information from Here
The application may be used to schedule a journey for a Geocoding & Search API, which provides apartment features
specific period of time. It would also be helpful for people like title, id, name, address, latitudes, longitudes, and other
who are new with that particular location and desire to see information, must be retrieved depending on the customer's
the city swiftly. selected location in order to get the best accommodation.

JETIR2304595 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org f731
© 2023 JETIR April 2023, Volume 10, Issue 4 www.jetir.org (ISSN-2349-5162)
B. Clean and Visualize Data E. Perform Clustering method on Geo-locational data
Now that the data has been gathered, it is crucial to Locations are categorized by the nearby amenities using k-
comprehend it, and one method to do this is by displaying means clustering techniques. If the latitude and longitude of a
the data using graphs. As opposed to reading hundreds of location are provided as inputs and there are many amenities
rows of data, graphs are user-friendly and speed up close by, the location is classified as amenity rich, whereas a
understanding of data. As seen in Figure 3, a boxplot is an location with fewer amenities is classified as amenity poor.
illustration of a graph that aids in evaluating dispersed Such that the count for each preference such as departmental
groups. stores, restaurants, gyms, hospitals and so on.
In order to find the best housing for a person in a specific
C. Retrieving important features from migrant data place, such as a set of latitude and longitude coordinates or
just a city name, this will group lodging for migrants based
through k-means Clustering
on their preferences for amities, budget, and proximity to the
We use k-means clustering to arrange the data into groups location. Locations that share a trait will be combined or
based on distance metrics. K-means clustering is thus grouped. Related locations will be clustered together. The
employed in this approach. In this case, obtaining the best clustered geo-locations are made into a graph shown in
number of clusters—i.e., clusters that are clearly Figure 6.
distinguished on specific attributes—is how key parameters
are extracted. When you cycle through various values of the F. Depicting the clusters locations on a map
clustering factor, take note of how the clusters alter. Don't
The clustered Geo-locational data must be plotted on a map
forget to plot boxplots once more to check for any noticeable
as the last step. A fantastic technique to map geo-locational
separation depending on various factors. As a result, only
data is with the help of the library Folium.
crucial factors are retrieved and employed in the grouping of
geospatial data process later, as shown in Figure 4. The
choice of distance measures is a critical step in clustering. It
defines how the similarity of two elements (p, q) is
calculated and it will influence the shape of the clusters.

The classical methods for distance measures are Euclidean


and Manhattan distances, which are defined as follow:

Euclidean distance:

Manhattan distance:

Fig. 3 Boxplot of Dataset

Where, p and q are two locations of length n.

D. Get Geo-locational Data from Here Geocoding & search


API
By creating a Here Geocoding & Search API account and
obtaining the API credentials, we can access the API and
query for residential locations within a fixed radius around a
chosen point. This is done by making an HTTP GET request
to the REST API server, specifying the search query
parameters, and including the Accept application/json
header to receive the response data in JSON format. Once
we have the response data, we can parse it into a usable data
frame for further analysis and visualization. Fig. 4 Graph of Clustered data
"Geocoding using open APIs: a study of Google Maps and
OpenStreetMap" by Nguyen and Nguyen (2018) compared
the performance of two popular geocoding APIs: Google
Maps and OpenStreetMap. The study found that Google
Maps had a higher accuracy rate but also had a more
restrictive usage policy and next it resulted in Here map.
Data cleaning allows us to remove inconsistent data and
provide the findings. Geo-locational information is shown as
a graph in Figure 5.

Fig. 5 Represents Geo-locational Data

JETIR2304595 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org f732
© 2023 JETIR April 2023, Volume 10, Issue 4 www.jetir.org (ISSN-2349-5162)

V. CONCLUSION
K-Means clustering technique has been used to cluster the
data which is taken as an geo-locational data from the Here
Geocoding and Search API based on the uses location. We
have built a simplistic website that accepts the user location
and then outputs a map that has been filled with geographic
data clusters. It is user friendly, making it simple to use. This
website aids in resolving a problem that affects migrants
frequently. With this service, we may find housing options
fast that fit our preferences and are useful for migrants. We
also included the nearby locations to our website options for
restaurants, banks and schools.

VI. FUTURE WORK


Fig. 6 Graph of number of clusters Our future work will be on adding the directions from one
place to another and also showing the distances between
them. We also implement the user login through google to
our website.
IV. RESULTS
On the basis of the characteristics, clusters are constructed.
REFERENCES
We implemented K-means clustering to the Benz circle in
Vijayawada, and the results are displayed in the figures [1] M. Sumithra, Sai Pavithra, L.Sowmiya, S.Swetha, T.Srinithi.
Exploratory Analysis of Geo-Locational Data - Accommodation
below. Recommendation. International Research Journal of Engineering and
Technology (IRJET) Volume: 09 Issue: 07 | July 2022.
[2] Gourang Ajmera, Alok Singh 2018. Hierarchical Data Analysis on
Geo-locational Data using Machine Learning.
[3] S. G. K. Patro et al., "A Hybrid Action-Related K-Nearest Neighbour
(HAR-KNN) Approach for Recommendation Systems," in IEEE
Access, vol. 8, pp. 90978-90991, 2020, DOI:
10.1109/ACCESS.2020.2994056.
[4] Al-Dabooni, S. and Wunsch, D., 2018. Model order reduction based
on agglomerative hierarchical clustering. IEEE transactions on neural
networks and learning systems, 30(6), pp.1881-1895.
[5] Roger D Peng. Exploratory Data Analysis with R, A book published
by Lulu.com in 2016 April 20.
[6] Srinivas Chellaboina, Maneesh Gembali, Sathya Priya. Product
Fig. 7 Geo-location Map with clustered locations Recommendation based on Customer Segmentation Engine. Published
in 2022 2nd International Conference on Intelligent Technologies
(CONIT) Date of Conference: 24-26 June 2022.
[7] Nemani, Y.M., Yadav, R., Patki, M., Padave, O. and Bhelande, M.M.,
Based on the characteristics, the regions are grouped into 2018. City Tour Traveller: Based on FourSquare API. City, 5(04).
clusters, we can see that three groups of clusters have [8] Psyllidis, A., Yang, J. and Bozzon, A., 2018. Regionalization of
emerged in Fig 7. social interactions and points-of-interest location prediction with
geosocial data. IEEE Access, 6, pp.34334-34353.
[9] Patel, P., Sivaiah, B. and Patel, R., 2022, July. Approaches for finding
Optimal Number of Clusters using K-Means and Agglomerative
Hierarchical Clustering Techniques. In 2022 International
Conference on Intelligent Controller and Computing for Smart Power
(ICICCSP) (pp. 1-6). IEEE.
[10] Wang, P., Ding, C., Tan, W., Gong, M., Jia, K. and Tao, D., 2022.
Uncertainty-aware clustering for unsupervised domain adaptive object
re-identification. IEEE Transactions on Multimedia.
[11] Daraio, E., Cagliero, L., Chiusano, S. and Garza, P., 2022.
Complementing Location-Based Social Network Data With Mobility
Data: A Pattern-Based Approach. IEEE Transactions on Intelligent
Transportation Systems.
[12] Sharma, S. and Batra, N., 2019, February. Comparative study of
single linkage, complete linkage, and ward method of agglomerative
clustering. In 2019 International Conference on Machine Learning,
Fig. 8 Depicting Clusters on Geo-location map Big Data, Cloud and Parallel Computing (COMITCon) (pp. 568-
573). IEEE.
The outcomes are shown as, Restaurants, Departmental [13] Gong, W., Zhang, W., Bilal, M., Chen, Y., Xu, X. and Wang, W.,
stores, and Gyms are most abundant in cluster 0 (Green), 2021. Efficient web APIs recommendation with privacy-preservation
for mobile app development in industry 4.0. IEEE Transactions on
whereas they are least prevalent in cluster 1 (Yellow), Industrial Informatics, 18(9), pp.6379-6387.
Cluster 2(Red) which contains more restaurants but fewer [14] Lee, J.H., Moon, I.C. and Oh, R., 2021. Similarity Search on Wafer
departmental stores and gym, these are depicted in Fig 8. Bin Map Through Nonparametric and Hierarchical Clustering. IEEE
Transactions on Semiconductor Manufacturing, 34(4), pp.464-474.

JETIR2304595 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org f733

You might also like