Image Classification Using Python Api, A Case Study of Dhulikhel Municipality

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

KATHMANDU UNIVERSITY

DHULIKHEL, KAVRE
DEPARTMENT OF GEOMATICS ENGINEERING

Mini Project Report On


IMAGE CLASSIFICATION USING PYTHON API, A CASE STUDY OF DHULIKHEL
MUNICIPALITY

Submitted By: Submitted To:


Er. Bikram Rawat Mr. Bhogendra Mishra
ME- Geoinformatics Kathmandu University
Kathmandu University
ACKNOWLEDGEMENT

In the time period of the semester’s mini project report, I had many people directly and indirectly
who have helped me to successfully complete the project with coordination, supervision, and
cooperation. I would like to express my heartfelt gratitude to every one of them.
Firstly, I would like to thank the Department of Geomatics Engineering, School of Engineering
Kathmandu University which has assisted me to apply the theoretical knowledge gained during
the studies practically and as well to obtain professional experiences for the future.
Secondly, I would like to express my sincere gratitude to Mr. Bhogendra Mishra sir, a mentor
who supported me in every aspect considering the completion of the project as well as guiding
me through various knowledge and understandings regarding the subject. Similarly, he provided
me the opportunity to explore the new area for expanding the knowledge of Digital Image
Processing and allowed me to do research on Image classification with predefined algorithms of
in-built open-source software as well as different APIs like python for different area.
Also, I would like to express my special gratitude to Er. Bishnu Acharya (ME-Geoinformatics),
Er. Sandesh Sharma (ME-Geoinformatics), Er. Nishon Tandukar(ME- Geoinformatics), and Mr.
Satya Ram Basnet (CTEVT), who had supported me entirely for the completion of this project.
Finally, I would also take this opportunity to express my deep sense of gratitude to all the
helping hands that have supported me in every possible way.

ii | P a g e
ABSTRACT

Image classification is a crucial and challenging task in various application domains, including
remote sensing, vehicle navigation, biomedical imaging, video-surveillance, biometry, industrial
visual inspection, robot navigation, and vehicle navigation. (Shaharum et al., 2018). Image
classification techniques encompass a range of algorithms and approaches for categorizing
images based on their visual features. Image was classified using one Unsupervised Kmeans
Algorithm and two Supervised Algorithms. i.e. SVM (Support Vector Machine) and RF (Random
Forest). All three were performed in Google Colab using Python Api and output was visualized
within the colab interface. In unsupervised classification, employing 6 clusters with 300
iterations yielded valuable insights into the data structure, albeit with slightly lower
performance metrics compared to supervised methods. The SVM model achieved an impressive
overall accuracy of 0.875, indicating correct identification of 87.5% of dataset samples, while its
Kappa index of 0.828 signified substantial agreement beyond random chance. Similarly, the
Random Forest model demonstrated a strong overall accuracy of 0.806, correctly identifying
approximately 80.6% of samples, with a Kappa index of 0.740, indicating significant agreement
with actual classes. It is recommended to conduct a comparative analysis of the classification
results with ground truth data to validate the accuracy and reliability of the classification models
in real-world applications

iii | P a g e
Table of Contents
ACKNOWLEDGEMENT .............................................................................................................. ii
ABSTRACT................................................................................................................................... iii
1. INTRODUCTION ................................................................................................................... 1
1.1. BACKGROUND .............................................................................................................. 1
1.2. PYTHON API IN IMAGE CLASSIFICATION ............................................................. 3
1.3. OBJECTIVES .................................................................................................................. 4
2. LITERATURE REVIEW ........................................................................................................ 5
3. MATERIALS AND METHODS ............................................................................................ 6
3.1. STUDY AREA................................................................................................................. 6
3.2. METHODOLOGICAL FRAMEWORK ......................................................................... 7
3.3. SOFTWARE REQUIRED ............................................................................................... 8
3.4. DATA ACQUISITION AND PRE-PROCESSING ........................................................ 8
3.5. GOOGLE COLAB SETUP.............................................................................................. 9
3.6. IMAGE CLASSIFICATION ALGORITHM .................................................................. 9
3.7. DATASET DESCRIPTION AND MODEL TRAINING ............................................. 10
3.8. EVALUATION METRIC .............................................................................................. 11
4. RESULTS AND DISCUSSION ............................................................................................ 14
4.1. VISUALISATION ......................................................................................................... 14
4.2. PERFORMANCE METRIC AND RESULT INTERPRETATION ............................. 16
4.3. CHALLENGES .............................................................................................................. 16
5. CONCLUSION AND RECOMMENDATION .................................................................... 17
6. REFERENCES ...................................................................................................................... 18
7. APPENDICES ....................................................................................................................... 19
7.1. Appendix 1: Process of downloading satellite image via Copernicus access hub. ........ 19
7.2. Appendix 2: Cliping AOI (Dhulikhel municipality) from downloaded satellite image
(only b2, b3 and b4) in Arcmap ................................................................................................ 20
7.3. Appendix 3: Layer Stacking in ENVI ............................................................................ 21
7.4. Creating dbf file using ArcMap tool for training data sets ............................................ 22

iv | P a g e
LIST OF TABLES
Table 1: Summary of Remote Sensing Classification Techniques ................................................. 2
Table 2: Software Used for The Project with The Purposes ........................................................... 8
Table 3: Data Set Description (Metadata) using for classification ............................................... 10
Table 4: Last Five Training Points (dbf) used to train model ....................................................... 11
Table 5: Model Training with Description ................................................................................... 11
Table 6: Evaluation Metric for SVM ............................................................................................ 12

LIST OF FIGURES
Figure 1: Location Map of Dhulikhel Municipality ....................................................................... 6
Figure 2: Methodological Framework for Image Classification and Analysis ............................... 7
Figure 3: Process Adopted in Pre-processing of Satellite Image .................................................... 8
Figure 4: Kappa Coefficient range ................................................................................................ 12
Figure 5: Classified Map of Dhulikhel using Kmeans Algorithm ................................................ 14
Figure 6: Classified Map of Dhulikhel using SVM Algorithm .................................................... 15
Figure 7: Classified Map of Dhulikhel using Random Forest Algorithm .................................... 15

v|Page
1. INTRODUCTION

1.1. BACKGROUND
Considering the fact that, with advancement in technology, human civilization and increase of
population the land is being used for different purposes. Similarly, for scientific and social
research and study one must have knowledge of properties of nature, specification and its
usability. With this the term classification emerged in the sector of environment (both natural
and artificial.) Images and photos taken from different sources like Terrestrial, Aerial and even
Space (Satellite) were used for classification of the features available on the Earth surface. The
image classification includes image pre-processing, image sensors, object detection, object
segmentation, feature extraction and object classification. The Image Classification system
consists of a database that contains predefined patterns that compare with an object to classify to
appropriate category. Image classification is a crucial and challenging task in various application
domains, including remote sensing, vehicle navigation, biomedical imaging, video-surveillance,
biometry, industrial visual inspection, robot navigation, and vehicle navigation. (Shaharum et al.,
2018)
The image classification procedure involves several key steps to ensure accurate categorization
of imagery data. Firstly, a classification scheme is designed, which defines the information
classes or categories to be identified in the images. These classes typically represent features
such as urban areas, agriculture, forests, etc. Field surveys are often conducted to collect ground
truth information and other ancillary data about the study area, which helps in refining the
classification scheme and training the classification algorithm.
Next, the images undergo preprocessing steps to correct for radiometric, atmospheric, geometric,
and topographic distortions. This includes techniques such as image enhancement and initial
clustering to group pixels with similar characteristics. Representative areas of the image are then
selected for analysis, and the initial clustering results are examined or training signatures are
generated to train the classification algorithm. The image classification algorithm is then applied
to the preprocessed images to assign pixels to their respective information classes based on their
spectral characteristics. Post-processing steps may be applied to the classified images, such as
complete geometric correction, filtering to remove noise or improve spatial coherence, and
decorating the classification results to enhance visualization. Finally, the accuracy of the
classification is assessed by comparing the classified results with ground truth data collected
during field studies. Discrepancies between the classification results and ground truth
information are analyzed to identify areas for improvement and to ensure the reliability and
validity of the classification outcomes.

1|Page
Image classification techniques encompass a range of algorithms and approaches for
categorizing images based on their visual features. Common techniques include supervised
learning methods like Support Vector Machines (SVM), Random Forests, and Convolutional
Neural Networks (CNNs), which learn to classify images based on labeled training data.
Unsupervised learning methods such as K-means clustering and hierarchical clustering are also
used for image classification, particularly when labeled data is scarce or unavailable.
Additionally, deep learning approaches, particularly CNNs, have shown remarkable performance
in image classification tasks due to their ability to automatically learn hierarchical features from
raw image data.
Table 1: Summary of Remote Sensing Classification Techniques

METHODS EXAMPLES CHARACTERISTICS

Maximum Likelihood classification Assumptions: Data are normally


Parametric and Unsupervised classification, distributed prior knowledge of class
etc. density functions
Nearest- Neighborhood
classification, fuzzy classification,
Non- parametric No prior assumptions are made.
neural networks and support vector
mechanisms, etc.
Maximum likelihood, Minimum Analyst identifies the training sites to
Supervised distance and Parallelepiped represents in classes and each pixel is
classification, etc. classified based on statistical analysis.
Prior ground information not known.
Pixels with similar spectral
Unsupervised ISODATA and K-means, etc.
characteristics are grouped according
to specific statistical criteria
Hard (parametric) Supervised and Unsupervised Classification using discrete categories
Considers the heterogeneous nature of
real world. Each pixel is assigned a
Soft (non-parametric) Fuzzy set classification logic
proportion of the in-land cover type
found within the pixel.
Pre-pixel - Classification of image pixel by pixel
Image regenerated into homogeneous
Object Oriented - objects, classification performed on
each object and pixel
Includes expert systems and artificial
Hybrid -
intelligence.

(Source Jensen, 2005: pp337-338)

2|Page
Image Classification had made great progress over the past decades in the following four areas:
(1) producing land cover map at regional and global scale; (2) development and use of advanced
classification algorithms, such as subpixel, pre-field, and knowledge-based classification
algorithms; (3) use of multiple remote-sensing features, including spectral, spatial,
multitemporal, and Multisensory information; and (4) incorporation of ancillary data into
classification procedures, including such data as topography, soil, road, and census data.
(Aldoski et al., 2013). Image processing is crucial for extracting insights from images, enhancing
clarity, and enabling analysis across domains like medicine, remote sensing, security, and AI. It
facilitates tasks such as object detection and pattern recognition, driving innovation and
efficiency in diverse sectors.
1.2. PYTHON API IN IMAGE CLASSIFICATION
Previously, images were classified manually with using different inbuilt algorithms within open
source software and enterprise software but with the advancement in technology, knowledge,
people often use different techniques, tools and intelligence in order to classify the images. It
includes performing on different software, webs or systems. Different programming languages
are used to perform different remote sensing and GIS applications including, JS, Python, C, C++,
etc. Both JavaScript and Python are useful, easy to learn and good for writing short scripts, but
Python aims much higher through its ability to work with objects (object-oriented programming
language) and, in addition, it is the language that lends itself perfect in the IT areas of Data
Analytics, Artificial Intelligence and Machine Learning. Given that the classification process is
part of ML jobs, it would be more natural to use Python language. (Paul & Voicu, 2021)
Python, a widely-utilized object-oriented interpreted language, offers strong portability, a
shallow learning curve, and a syntax reminiscent of natural language with pseudo-code
attributes. Its scientific computing ecosystem is robust, featuring prominent libraries like Numpy,
Scikit-learn, OpenCV, PyTorch, TensorFlow, and GDAL. Numpy provides powerful array
processing capabilities, including high-performance operations for multidimensional arrays,
Fourier transforms, and linear algebra. Scikit-learn supplies efficient machine learning
algorithms and tools for data mining and analysis. OpenCV efficiently implements computer
vision algorithms for tasks such as image stitching and face recognition. PyTorch and
TensorFlow are leading deep learning frameworks offering easy access to common operations
and GPU support. GDAL is instrumental in processing geospatial data, with a Python interface
facilitating remote sensing image processing. Python's rich scientific computing ecosystem
ensures robust infrastructure and dependable services for remote sensing applications. (Meng et
al., n.d.)

3|Page
1.3. OBJECTIVES

Primary Objective: The primary objective of this study is ‘to perform image classification of
Dhulikhel municipality using python api”
Secondary Objective: The secondary objectives of this study supporting the primary objectives
are:
• Perform any 3 methods of image classification, i.e. K means algorithm, Random Forest
and SVM.
• Implement the use of Python API in Google Colab.
• To understand the concept of model training and data set used.
• To analyze different evaluation and performance metrices.
• To evaluate kappa coefficient of the performed analysis.
• To understand the use of different python library for image classification and remote
sensing applications.

4|Page
2. LITERATURE REVIEW

(Prasai et al., 2021) study evaluated the GEE Python API utility for classifying the freely
available NAIP aerial imagery of 2017 to derive the land use land cover (LULC) information of
a Panhandle area of Florida, USA and identified eight major LULC classes with an overall
accuracy of 86% and Kappa value of 79%. They completed all remote sensing data analyses
procedures including data retrieval, classification, and report preparation in the Jupyter notebook,
an open-source web application and concluded that the open-source nature of GEE Python API
and its library of remote sensing data could benefit remote sensing projects throughout the world,
especially where access to commercial image processing software packages and remote sensing
data are limited.
(Paul & Voicu, 2021) paper classified an image using a Machine Learning approach inside the
Google Earth Engine platform and Jupyter Notebook: feeding the computer with training data (in
that case being points/pixels having a label which represent the land-cover type) and that aims to
learn to recognize the type of pixel through a model built on the technique called supervised
learning.
(Meng et al., n.d.) paper proposed the Python remote sensing image processing library,
introduced its main functions and characteristics, selected two functions to implement them with
PyRS and ENVI, and compared the results and found out PyRS makes up for the missing gap in
the image processing algorithm library in the field of remote sensing, and provides users with a
highly transparent and repeatable workflow.
(Brandolini et al., 2021) study represents in his paper about the first applications of the Google
Earth Engine (GEE) Python application programming interface (API) in studies of historic
landscapes. The complete free and open-source software (FOSS) cloud protocol proposed here
consists of a Python code script developed in Google Colab, which could be adapted and
replicated in different areas of the world.
(Li et al., 2014) study reviewed major remote sensing image classification techniques, including
pixel-wise, sub-pixel-wise, and object-based image classification methods, and highlighted the
importance of incorporating spatio-contextual information in remote sensing image
classification.
(Garima, 2023) paper presents the method for the process of representing the remote sensing data
on glaciers graphically and pictorially, where, Python programming language was used for
visualization of data with its libraries NumPy, pandas, matplotlib, seaborn and plotly.

5|Page
3. MATERIALS AND METHODS

3.1. STUDY AREA


Dhulikhel, situated in the Kavrepalanchok District of Nepal, is a municipality traversed by two
significant highways, namely the B.P. Highway and the Araniko Highway. The latter links
Nepal's capital, Kathmandu, with the border town of Kodari in Tibet. Positioned on the eastern
periphery of the Kathmandu Valley, Dhulikhel rests south of the Himalayas at an altitude of
1550 meters above sea level. It lies approximately 30 kilometers southeast of Kathmandu and 74
kilometers southwest of Kodari.(“Dhulikhel,” n.d.)
Since Dhulikhel is one of the old city with the socio-cultural value and also being close to the
country capital, it has faced high urbanization. Within decades, the agricultural land and forest
has turned to urbanization and due to which it requires accurate and precise classification for
study, development programs and coordination.

Figure 1: Location Map of Dhulikhel Municipality

6|Page
3.2. METHODOLOGICAL FRAMEWORK
The objective of this project is ‘to perform image classification of Dhulikhel municipality using
python api, with other supporting secondary objectives. The process adopted is shown below
with the framework diagram of process and method adopted.

Figure 2: Methodological Framework for Image Classification and Analysis

7|Page
3.3. SOFTWARE REQUIRED
The process of image classification was performed within Google Colab using Python Api and
inbuilt libraries but different software/tools were used to complete the project which are listed
below with the area they were used:
Table 2: Software Used for The Project with The Purposes
S.N: Software Used Purpose
1 ArcMap Image Sub-setting, Digitization, Creating Training
samples (dbf files), Verification
2 QGIS Merging the band for Image Classification
3 ENVI Layer Stacking for demo analysis

3.4. DATA ACQUISITION AND PRE-PROCESSING


Data acquisition and pre-processing of satellite imagery are crucial steps that ensure the quality,
accuracy, and usability of the data for subsequent analysis. These processes involve correcting
for atmospheric effects, geometric distortions, and other artifacts, thereby enhancing the
reliability and interpretability of the imagery for various applications such as land cover
classification, environmental monitoring, and urban planning.
Since the project classify the image obtained from satellite, so there requires the sources and
process adopted for the project. The satellite image was downloaded via Copernicus access hub
and it was sub-set selecting respective area. Then different bands were stacked and uploaded to
the drive where directory was defined for the further processing. The diagram below shows the
process along with the sources and tools used for acquisition and pre-processing.

Figure 3: Process Adopted in Pre-processing of Satellite Image

8|Page
3.5. GOOGLE COLAB SETUP
Google Colab provides a cloud-based platform for running Python code in a Jupyter notebook
environment, offering free access to computational resources like CPUs and GPUs. Python APIs
work seamlessly within Colab, enabling users to utilize popular libraries and frameworks for
tasks such as data analysis and machine learning. Its collaborative features and integration with
Google Drive make it a versatile tool for researchers, developers, and data scientists.
To set up Google Colab for the image classification project, Google Colab was accessed through
web browser using my Google account credentials. Once logged in, a new Colab notebook was
created, and the necessary Python libraries for image processing and classification, such as
TensorFlow, scikit-learn, and rasterio, were installed. The satellite image data was imported into
Colab, by uploading the files to Google Drive and mounting the drive in the notebook. The
Colab environment was then configured for the project requirements, including enabling GPU
acceleration if necessary. ‘from google.colab import drive’ line was used to connect our Google
Colab notebook to our Google Drive account. This enables us to access and manipulate files
stored in Google Drive directly from within the notebook environment.
‘drive.mount('/content/drive', force_remount=True)’, this line was used to mount google drive to
respective directory and from where the file required were accessed.

3.6. IMAGE CLASSIFICATION ALGORITHM


Since the objective of this project is to classify any satellite image using 3 algorithms, Image was
classified using one Unsupervised Kmeans Algorithm and two Supervised Algorithms. i.e. SVM
(Support Vector Machine) and RF (Random Forest). All three were performed in Google Colab
using Python Api and output was visualized within the colab interface.
Unsupervised classification using the K-means algorithm is a widely used technique in image
processing for clustering similar pixels without the need for labeled training data. In Google
Colab, Python was employed to implement K-means clustering on satellite imagery. The
algorithm partitions the image into a predetermined number of clusters based on pixel similarity,
with centroids representing cluster centers. Each pixel is then assigned to the nearest centroid,
resulting in a classified image where pixels belonging to the same cluster share similar spectral
characteristics. This approach allows for the identification of distinct land cover or land use
classes within the image, providing valuable insights for various applications such as
environmental monitoring, urban planning, and agriculture management. The code segment
(appendix 7.4) performs unsupervised classification using the K-means algorithm on satellite
imagery of Dhulikhel . It utilizes the rkmeans function from the scikeo library to cluster pixels
into six classes based on spectral similarity. The resulting classification map is visualized
alongside the satellite image using Matplotlib, providing insights into land cover patterns within
the area.

9|Page
In supervised classification using Support Vector Machine (SVM) algorithm, the satellite image
data of the target area in Dhulikhel was processed in Google Colab using Python. By employing
the SVM model from the scikeo library, the algorithm learned from labeled training samples to
classify pixels into distinct land cover classes. Through iterative optimization, SVM maximizes
the margin between classes, enhancing classification accuracy. The resulting SVM classification
map reveals detailed land cover information, aiding in land use analysis and resource
management decisions for the region. The code segment (appendix 7.4) performs supervised
classification using the SVM algorithm on satellite imagery of Dhulikhel
In this supervised image classification utilizing the Random Forest algorithm, conducted in
Google Colab using Python, the satellite image data from Dhulikhel underwent processing.
Leveraging the scikeo library's Random Forest classifier, the algorithm employed an ensemble of
decision trees to classify pixels based on labeled training samples. By aggregating predictions
from multiple trees, Random Forest mitigates overfitting and enhances classification accuracy,
yielding a detailed map of land cover classes in Dhulikhel. This classification map provides
valuable insights for land use planning, environmental monitoring, and resource management in
the region. The code segment (appendix 7.4) performs supervised classification using the
Random Forest algorithm on satellite imagery of Dhulikhel.

3.7. DATASET DESCRIPTION AND MODEL TRAINING


Since most of the process were adopted within Colab Interface using Python api, and all the
libraries required with imported and installed within the interface.
Table 3: Data Set Description (Metadata) using for classification

S.N Data Description

Layer Stacked Image of


1 Uploaded and Mounted to Drive
Dhulikhel
2 No of Bands Used 3 (Band 4, Band 3 and Band 2 for natural composition)

3 Imagery Sentinal-2, 10m*10m


4 No of Training Points 238 (71-Class 1, 49-Class 2, 62-Class 3 and 56-Class 4)
Class 1= Forest
Class 2= Cultivable Land
5 Class Description
Class 3= Road
Class 4= Building

6 Training/ Test Split 70% for Training and 30% Test


7 Correction Not Applied

10 | P a g e
Table 4: Last Five Training Points (dbf) used to train model
index Id (Class) b1_432band b2_432band b3_432band

234 4 2239.0 2173.0 2339.0


235 4 2119.0 1968.0 2837.0
236 4 2362.0 2363.0 2873.0

237 4 2433.0 2477.0 2682.0


238 4 2432.0 2593.0 2919.0

Table 5: Model Training with Description

S.N Data Description

Iteration: 300
1 Unsupervised Classification Clusters: 6
Algorithm: K Means
Training Split: 70%, Class: 4 , Color Palette: #229954",
2 Supervised- SVM
"#7CED5E", "#964B00", "red", Size: 15*15, Grid: NA
Training Split: 70%, Class: 4 , Color Palette: #229954",
3 Supervised -Random Forest
"#7CED5E", "#964B00", "red", Size: 12*7, Grid: NA

3.8. EVALUATION METRIC

Evaluation metrics in image classification refer to the measures used to assess the performance
of a classification model in categorizing images into different classes. These metrics provide
quantitative insights into the model's effectiveness and help determine its accuracy, precision,
recall, F1-score, and other performance indicators. Importance of evaluation metrics lies in their
ability to objectively evaluate the classification model's performance, identify areas for
improvement, and compare different models based on their predictive capabilities. They depict
the model's ability to correctly classify images into their respective classes, minimize
misclassifications, and provide reliable predictions for unseen data. Ultimately, evaluation
metrics help stakeholders make informed decisions about the effectiveness and suitability of the
image classification model for their specific application or task.
Since among 3 types of classification performed, one is unsupervised for which no accuracy
assessment is done and performing the accuracy assessment for the image will be considered in
future works, whereas, evaluation metric is shown in the table below for other two types of
classification performed for the project.

11 | P a g e
Table 6: Evaluation Metric for SVM
SVM- Support Vector Machine

Overall Accuracy: 0.875


Kappa Index: 0.8282988871224166
Tot Users_Accurac
index 1 2 3 4 Commission
al y

1 27.0 0.0 0.0 0.0 27.0 100.0 0.0

84.21052631578 15.78947368421
2 0.0 16.0 0.0 3.0 19.0
947 0535

84.61538461538 15.38461538461
3 0.0 0.0 11.0 2.0 13.0
461 5387

69.23076923076 30.76923076923
4 0.0 0.0 4.0 9.0 13.0
923 0774

Na
Total 27.0 16.0 15.0 14.0 NaN NaN
N

ProducerAccur 100. 100. 73.33333333333 64.28571428571 Na


NaN NaN
acy 0 0 333 429 N

26.66666666666 35.71428571428 Na
Omission 0.0 0.0 NaN NaN
667 571 N

Figure 4: Kappa Coefficient range


Source:(B R & S V, 2018)

12 | P a g e
The reported overall accuracy of 0.875 signifies that the SVM classification model correctly
identified 87.5% of the samples in the dataset, reflecting its proficiency in distinguishing
between different classes. Additionally, the Kappa index, which stands at 0.828, suggests a
substantial level of agreement beyond random chance between the predicted classifications and
the actual classes. These metrics collectively illustrate the robustness and reliability of the SVM
classifier in accurately categorizing the dataset, highlighting its effectiveness in image
classification tasks
Table 7: Evaluation Metric for Random Forest
Random Forest
Overall Accuracy: 0.8055555555555556
Kappa Index: 0.7400722021660651
index 1 2 3 4 Tot Users_Accur Commission
al acy

1 17. 2.0 0.0 0.0 19. 89.473684210 10.526315789


0 0 52632 473685

2 0.0 13.0 0.0 3.0 16. 81.25 18.75


0

3 0.0 0.0 18.0 2.0 20. 90.0 10.0


0

4 0.0 3.0 4.0 10.0 17. 58.823529411 41.176470588


0 76471 23529

Total 17. 18.0 22.0 15.0 Na NaN NaN


0 N

Producer_Ac 100 72.222222222 81.818181818 66.666666666 Na NaN NaN


curacy .0 22221 18183 66666 N

Omission 0.0 27.777777777 18.181818181 33.333333333 Na NaN NaN


777786 818173 33334 N

The reported overall accuracy of 0.806 suggests that the Random Forest classification model
correctly identified approximately 80.6% of the samples in the dataset, indicating a high level of
accuracy in distinguishing between different classes. Moreover, the Kappa index of 0.740
indicates substantial agreement beyond random chance between the predicted classifications and
the actual classes.

13 | P a g e
4. RESULTS AND DISCUSSION

4.1. VISUALISATION
The individual classified image using three different algorithm are shown below. In unsupervised
classification, 6 different classes are formed with 300 iteration and image seems more relevant,
whereas, in case of SVM and RF, the background (face color) seems to be same as the value for
one of the classes. This is due to the fact that, the image color for respective class was similar to
that of background during the classification due to which the result seems abit irrelevant.
Considering the overall accuracy and Kappa index obtained from the classification (kappa
index), SVM falls under Excellent category and RF falls under Good category. The result of
SVM is more accurate than RF. The color paletter for supervised classification is matched to the
standard land use map considering the cartographic symbol and principle.

Figure 5: Classified Map of Dhulikhel using Kmeans Algorithm

14 | P a g e
Figure 6: Classified Map of Dhulikhel using SVM Algorithm

Figure 7: Classified Map of Dhulikhel using Random Forest Algorithm

15 | P a g e
4.2. PERFORMANCE METRIC AND RESULT INTERPRETATION

SVM- Support Vector Machine Random Forest

The confusion matrix (Confusion metric ) represent The confusion matrix (Confusion metric)
the results of a classification task with four classes represents the results of a classification task with
(indexed 1 to 4). Each row represents the predicted four classes (indexed 1 to 4). Each row corresponds
class, while each column corresponds to the true to the predicted class, while each column represents
class. the true class.
Here, in class 1, all 27 samples were correctly Here, in class 1, 17 samples were correctly
classified, resulting in a perfect user's accuracy of classified, while there were 2 misclassifications
100%. Additionally, the producer's accuracy for into class 2, resulting in an overall accuracy of
class 1 is also 100%, indicating that all true class 1 89.47%. The producer's accuracy for class 1 is
samples were correctly identified by the classifier. 100%, indicating that all true class 1 samples were
correctly identified.
Similarly, for class 4, only 9 out of 14 samples
were correctly classified, leading to a user's Similarly, for class 4, only 10 out of 15 samples
accuracy of 69.23%. The producer's accuracy for were correctly classified, resulting in a relatively
class 4 is 64.29%, suggesting that a significant low user's accuracy of 58.82%. However, the
portion of true class 4 samples were misclassified producer's accuracy for class 4 is 66.67%,
by the classifier. suggesting that 10 out of the 15 true class 4
samples were correctly identified by the classifier.
Overall Accuracy: 0.875 Overall Accuracy: 0.8055555555555556
Kappa Index: 0.8282988871224166 Kappa Index: 0.7400722021660651
Result: Almost Perfect Result: Good

4.3. CHALLENGES
During the time period of completion for the project, there were various challenges that were
faced. From considering the individual skill to the availability of resources affected the accuracy
and reliability of classification performed and result obtained. Some of the challenges that were
faced are listed below considering the fact that those challenges were overcome and will be
utilized in future works.
• Lack of high-end programming skill
• Lack of highly accurate data set (especially primary data sources)
• Glitch and issues with in different software (Layer stacking by ENVI and Band
composition by ArcMap giving different result and modifying the pixel value)

16 | P a g e
5. CONCLUSION AND RECOMMENDATION

The objective of this report was met with quite relevant results. In this study, we applied three
different classification algorithms i.e. Support Vector Machine (SVM), Random Forest, and
Unsupervised K-means to classify satellite imagery in Google Colab using Python. Each
algorithm demonstrated varying levels of effectiveness in classifying the satellite imagery. In
unsupervised classification, 6 different classes are formed with 300 iteration. Unsupervised K-
means clustering provided valuable insights into the data structure, although its performance
metrics were slightly lower compared to supervised methods. Overall accuracy of 0.875 signifies
that the SVM classification model correctly identified 87.5% of the samples in the dataset,
reflecting its proficiency in distinguishing between different classes. Additionally, the Kappa
index, which stands at 0.828, suggests a substantial level of agreement beyond random chance
between the predicted classifications and the actual classes. Overall accuracy of 0.806 suggests
that the Random Forest classification model correctly identified approximately 80.6% of the
samples in the dataset, indicating a high level of accuracy in distinguishing between different
classes. Moreover, the Kappa index of 0.740 indicates substantial agreement beyond random
chance between the predicted classifications and the actual classes.
Based on the findings of this study, several recommendations can be made to improve the
classification process and enhance classification accuracy. Firstly, further experimentation with
different hyperparameters and model configurations for each algorithm may yield improved
classification results. Additionally, integrating advanced preprocessing techniques such as
feature scaling, dimensionality reduction, and image augmentation could enhance the
discriminative power of the classification models. Beside above used algorithms, other algorithm
and methods like Naïve Bayes, CNN (Convolution Neural Network) etc. can be used. Not only
the data view, but aesthetic view can be maintained within the output. Finally, it is recommended
to conduct a comparative analysis of the classification results with ground truth data to validate
the accuracy and reliability of the classification models in real-world applications.

17 | P a g e
6. REFERENCES

• Aldoski, J., Mansor, S., Shafri, H., & Shafri, M. (2013). Image Classification in
Remote Sensing. 3.
• B R, S., & S V, R. (2018). An Investigation on Land Cover Mapping Capability of
Classical and Fuzzy based Maximum Likelihood Classifiers. International Journal
of Engineering & Technology, 7(2), 939. https://doi.org/10.14419/IJET.V7I2.10743
• Brandolini, Filippo Domingo-Ribas, Guillem Zerboni, Andrea Turner, & sam.
(2021). A Google Earth Engine-enabled Python approach for the identification of
anthropogenic palaeo-landscape features. Open Research Europe, 1.
• Dhulikhel. (n.d.). Mapcarta. Retrieved March 14, 2024, from
https://mapcarta.com/Dhulikhel
• Li, M., Zang, S., Zhang, B., Li, S., & Wu, C. (2014). A Review of Remote Sensing
Image Classification Techniques: the Role of Spatio-contextual Information.
European Journal of Remote Sensing, 47(1), 389–411.
https://doi.org/10.5721/EUJRS20144723
• Meng, X., Han, Y., Zhang, H., Huang, C., Yang, W., Liu, C., Song, J., & Liu, Z.
(n.d.). PyRS: A Python package to process remotely sensed data for geomatics
education. https://doi.org/10.5194/isprs-archives-XLVIII-5-W1-2023-21-2023
• Paul, T., & Voicu, S. (2021). Image Classification Using Machine Learning
Algorithms in Google Earth Engine Environment. Informatica Economica, 25, 5–16.
https://doi.org/10.24818/issn14531305/25.3.2021.01
• Prasai, R., Schwertner, T. W., Mainali, K., Mathewson, H., Kafley, H., Thapa, S.,
Adhikari, D., Medley, P., & Drake, J. (2021). Application of Google earth engine
python API and NAIP imagery for land use and land cover classification: A case
study in Florida, USA. Ecological Informatics, 66, 101474.
https://doi.org/https://doi.org/10.1016/j.ecoinf.2021.101474
• Shaharum, N. S. N., Shafri, H. Z. M., Ghani, W. A. W. A., Samsatli, S., Yusuf, B.,
Al-Habshi, M. M. A., & Prince, H. M. (2018). Image classification for mapping oil
palm distribution via support vector machine using scikit-learn module.
International Archives of the Photogrammetry, Remote Sensing and Spatial
Information Sciences - ISPRS Archives, 42(4/W9), 139–145.
https://doi.org/10.5194/ISPRS-ARCHIVES-XLII-4-W9-133-2018

18 | P a g e
7. APPENDICES
7.1. Appendix 1: Process of downloading satellite image via Copernicus
access hub.
Link to get access to the Copernicus ecosystem to download sentinel image.
https://www.copernicus.eu/en/access-data/conventional-data-access-hubs
https://browser.dataspace.copernicus.eu/?zoom=5&lat=50.16282&lng=20.78613&themeId=DEF
AULT-
THEME&visualizationUrl=https%3A%2F%2Fsh.dataspace.copernicus.eu%2Fogc%2Fwms%2F
a91f72b5-f393-4320-bc0f-
990129bd9e63&datasetId=S2_L2A_CDAS&demSource3D=%22MAPZEN%22&cloudCoverag
e=30&dateMode=SINGLE

19 | P a g e
7.2. Appendix 2: Cliping AOI (Dhulikhel municipality) from downloaded
satellite image (only b2, b3 and b4) in Arcmap

20 | P a g e
7.3. Appendix 3: Layer Stacking in ENVI
(Software like ArcMap performs band composite whereas ENVI does layer stacking where
spectral properties of band are preserved. The output resolution of layer stacked image in ENVI
is the resolution of first image used.)

21 | P a g e
7.4. Creating dbf file using ArcMap tool for training data sets

(Creating shape file to draw points in the desired location)

(Adding basemap in ArcMap to create training sample points)

22 | P a g e
(Obtained value from point i.e.e pixel)

https://colab.research.google.com/drive/1LLhG_jZkdsOkuxd94gBiGtQ
GXYa5G3ji?usp=sharing
(Link to the code used for the completion of project.)

https://drive.google.com/drive/folders/1VPZD8nC_sBB9o9VoF5pfZ6A
uoFRvUwuk?usp=sharing
(Link to the files used for classification of images.)

23 | P a g e

You might also like