Professional Documents
Culture Documents
Project - Report - Forest Fire Prediction - Group 119
Project - Report - Forest Fire Prediction - Group 119
Prevention
Project Report Submitted in Partial Fulfilment of the Requirements for the Degree of
Bachelor of Technology
In
Computer Science
Submitted by:
Tanul Khare: (Roll No. 200102420)
Shubhangi Maurya: (Roll No. 200102409)
Sudhanshu Kumar: (Roll No. 210102524)
We declare that this written submission represents my ideas in my own words and where
others' ideas or words have been included, we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in my submission. I understand that any violation of the above will be cause for disciplinary
action by the University and can also evoke penal action from the sources which have thus
not been properly cited or from whom proper permission has not been taken when needed.
The plagiarism check report is attached at the end of this document.
2
TABLE OF CONTENTS
Forest Fire Prediction: Harnessing Data for early detection and Prevention..................................1
DECLARATION....................................................................................................................................2
LIST OF ABBREVIATIONS................................................................................................................4
CHAPTER – 1........................................................................................................................................5
Forest Fire Prediction............................................................................................................................5
1.1 INTRODUCTION..........................................................................................................5
1.2 APPLICATION...............................................................................................................5
1.3 TECHNIQUES................................................................................................................6
1.4 MACHINE LEARNING...............................................................................................7
1.5 REGRESSION................................................................................................................9
1.6 ALGORITHMS............................................................................................................10
CHAPTER – 2......................................................................................................................................15
PROJECT ANALYSIS........................................................................................................................15
2.1 LITERATURE REVIEW............................................................................................15
2.2 PROBLEM STATEMENT..........................................................................................16
CHAPTER – 3......................................................................................................................................18
METHODOLOGY..............................................................................................................................18
CHAPTER – 4......................................................................................................................................19
RESULTS AND DISCUSSION...........................................................................................................19
CHAPTER - 5……………………………………………………………………………………….23
CONCLUSION AND FUTURE WORK…………………………………………………………..23
REFRENCES………………………………………………………………………………………..24
PLAGIARISM CHECK REPORT…….…………………………………………………26
3
LIST OF ABBREVIATIONS
RH Relative Humidity
WS Wind speed
FFMC Fine Fuel Moisture Code
DMC Duff Moisture Code
DC Drought Code
ISI Initial Spread Index
BUI Buildup Index
FWI Fire Weather Index
TEMP Temperature
4
CHAPTER – 1
FOREST FIRE PREDICTION
1.1 INTRODUCTION
A forest fire, marked by its rapid spread across wooded landscapes and consumption of
vegetation and combustible materials, may stem from natural activities like lightning strikes
or human activities such as campfires, discarded cigarettes, or deliberate acts of arson.
These fires pose grave risks to human settlements, wildlife habitats, ecosystems, and air
quality, potentially resulting in community displacement, property devastation, biodiversity
loss, and even loss of human lives.
Forecasting wildfires involves utilizing a range of data sources and analytical methods. This
includes considering topographic elements like vegetation cover and slope steepness,
alongside temperature, humidity, and wind speed. As machine learning algorithms look over
this data, prediction accuracy improves gradually.
The primary objective is to integrate decision support systems and operational workflows
with predictive insights to enable proactive mitigation of fire risks. Early warning systems
and real-time monitoring play vital roles in alerting stakeholders to new threats
1.1.2 APPLICATION
Gender prediction analysis finds its application in various sectors such as:
1.3 TECHNIQUES
5
The practice of applying methods like statistical analysis, machine learning, and data mining
to big and complex databases in order to extract meaningful knowledge and insights is known
as data science. It involves gathering, organizing, processing, and analyzing data in order to
identify relationships, trends, and patterns that might influence choices and spur creativity in
a variety of sectors. Businesses can use data science to turn unorganized information into
insightful knowledge that helps them solve problems, predict future occurrences, and
optimize operations.
(Fig 1.1) Hierarchy showing the different fields of Data Science [1]
MODELS
Linear Regression
Logistic Regression.
Elastic net Regression
Decision Tree.
SVM (Support Vector Machine) Algorithm.
KNN (K- Nearest Neighbours) Algorithm
Random Forest Algorithm.
6
1.4.1 CATEGORIES OF MACHINE LEARNING: [2]
SUPERVISED LEARNING:
Supervised learning is the subcategory of machine learning that focuses on learning a
classification, or regression model, that is, learning from labelled training data (i.e., inputs
that also contain the desired outputs or targets; basically, “examples” of what we want to
predict).
7
1. Classification: It is a Supervised Learning task where output is having defined labels
(discrete value). In binary classification, model predicts either 0 or 1; yes or no but in case of
multi class classification, model classifies more than one class. Example: Gmail classifies
mails in several classes like social, promotions, updates, forum.
2. Regression: It is a Supervised Learning task where output is having continuous value. The
goal here is to predict a value as much closer to actual output value as our model can and then
evaluation is done by calculating error value. The smaller the error the greater the accuracy of
our regression model.
UNSUPERVISED LEARNING:
The process of training a machine with unlabelled data and allowing the algorithm function
on it unassisted is known as unsupervised learning. In this case, the machine's goal is to
group data using similarities, patterns, and differences without need any prior data training.
In contrast with supervised learning, the lack of a teacher suggests the machine will not
receive any kind of instruction. Therefore, the machine's ability to determine for itself the
hidden structure in unlabelled data is limited.
1. Clustering: Finding the natural groupings in the data—like classifying clients based
on their purchasing patterns—is known as a clustering problem.
2. Association: Identifying rules that describe significant portions of your data, like
"people who buy X also tend to buy Y," is known as an association rule learning
problem.
REINFORCEMENT LEARNING:
Reinforcement learning addresses the question of how a system that senses and acts in its
environment can learn to choose optimal actions to achieve its goals. This very generic
problem covers tasks such as learning to control a mobile robot, learning to optimize
operations in factories, and learning to play board games. Each time the system performs an
action in its environment, a trainer may provide a reward or penalty to indicate the
desirability of the resulting state. The task of the agent is to learn to choose sequences of
actions that produce the greatest cumulative reward.
1.5 REGRESSION
Regression is a valuable and widely used tool in the world of data science and machine
learning. It empowers us to explore and predict the connections between multiple factors. In
simpler terms, regression allows us to uncover a mathematical equation that links one factor)
with one or more other factors.
8
1. Prediction: Let's say you have information about the price of houses in a
neighbourhood and want to know the price of a new house. Regression helps you
make a prediction based on the features of the new house, such as its size, number of
rooms, and location.
2. Understanding Relationships: Understanding how various factors affect each other
becomes simpler with the use of regression. You might be keen on learning, for
instance, how exam results are affected by study time. A significant relationship
between study time and scores can be identified via regression analysis.
3. Identifying Important Factors: In complex situations with many variables,
regression helps us identify which factors have a valued impact on the outcome. It
helps us separate the essential factors from the ones that don't matter much.
4. Decision Making: Organizations and businesses use regression to make informed
decisions. For instance, a company might use regression to predict customer demand
for a product, helping them plan their production and inventory efficiently.
In order for regression models to function, they must first estimate the relationship among
distinct variables, occasionally referred to as indicators, and a dependent variable, or the
variable we wish to forecast. The amount and direction of each independent variable's
influence on the dependent variable is represented by the coefficients that the model
estimates for each variable. [3]
The model can be trained on historical data and then, by applying the learnt coefficients to
the values of the independent variables, be used to forecast the values of the dependent
variable for new or unseen data. How effectively the model explains the underlying
relationship between the variables and how representative the training data is of the
population to which the model is intended to generalize determine how accurate the
predictions are.
9
1.6 ALGORITHMS
A linear equation is fitted to the observed data using the statistical technique of linear
regression to represent the connection among independent variables (predictors) and a
dependent variable (outcome). In statistics and machine learning, it is among the most
straightforward and popular regression approaches.
10
Inference: Linear regression can also be employed for inference, where the goal is to
understand the relationship between variables and interpret the coefficients.
Feature Selection: Lasso efficiently executes feature selection, identifying the most
significant predictors while discarding superfluous or irrelevant variables by
decreasing some coefficients to zero.
11
Model Interpretation: Lasso regression can assist in enhancing the model's
interpretability by highlighting the most useful variables and the related coefficients.
Lasso regression gets into trouble when the predictors are more than the number of
observations.
If there are two or more highly collinear variables then lasso regression will select one
of them randomly which is not a good technique in data interpretation.
12
Collinearity Reduction:
By decreasing the coefficients of the predictor variables, ridge regression
considerably minimizes the effect of multicollinearity (high correlation).
Parameter Estimation:
By reducing the variance of the coefficients, ridge regression provides estimations of
the coefficients that are more reliable, especially if there are more variables than data.
Difficulty in Interpretation:
Because the coefficients in ridge regression are decreased towards zero and might not
accurately represent the significance of the predictors, interpreting the results can
become more difficult.
13
1.6.15 Applications of Elastic Net Regression
Multicollinearity Handling:
By selecting and shrinking sets of correlated variables together, elastic Net regression
effectively handles multicollinearity (high correlation) among predictor variables.
Computational Overhead:
Elastic Net regression requires addressing a more complex optimization task than
Lasso or Ridge regression, which can lead to greater computing cost, particularly for
large datasets.
CHAPTER – 2
PROJECT ANALYSIS
Authors: Virupaksha Gouda R, Anoop R, Joshi Sameerna, Arif Basha, Sahana Gali
14
The existing systems use various technology like Machine learning techniques and Artificial
Intelligence and Wireless network utilized for collecting 24- hour weather data continuously,
which provides a higher chance to reflect perfectness of the status of forest environment.
Depending on those system, we can decide which days have the highest possibility of
catching a forest fires and danger and paid special attention to prevent forest fire for forest
guards. [5]
2.1.2 Forest fire Detection Using Machine Learning Technique”,2020
Authors: C. Amira. A. Elsonbaty, Ahmed M. Elshewey
This research presents the use of machine learning regression approaches for the prediction of
forest fire-prone zones. The data set utilized in this paper, which includes the climate and
physical characteristics of the Montesinos park in Portugal, is available in the UCI machine
learning repository. Along with these machine learning techniques, the research suggests
several more, including the ridge regression, lasso regression algorithm, and linear regression
with a data set of 13 characteristics and 517 entries per row. Comparing the accuracy of the
ridge regression and lasso regression techniques, the linear regression algorithm yields higher
accuracy.. [6]
2.1.4 Evaluation of Random Forest model for forest fire prediction based on climatology
over Borneo” 2019
Authors: E. Ayu Shabrina, Intan N. Wahyuni, Rifika Sadikin, Arninda L. Latifah
Forest fires are threatened by human activities, ecosystem and climate processes, but in
Borneo only variable of climate can be quantified . The goal of the research is to determine
how well the random forest model predicts forest fires by utilizing climate variables and
satellite data of burned regions as input. It is anticipated that forest fire prediction would
lessen the impact of forest fires going forward. By means of a yearly and geographical
variability analysis, it was found that the random forest model, incorporating all selected
climate variables, effectively represents forest fire events across the Borneo region of
Indonesia. [8]
2.1.5 A Brief Review of Machine Learning Algorithms in Forest Fires Science, 2023
Authors: Ramez Alkhatib ,Wahib Sahwan ,Anas Alkhatieb and Brigitta Schütt
As forest fires become more frequent globally, early prediction is crucial. Artificial
intelligence, particularly machine learning, is vital for forecasting and assessing fire risk. This
article reviews machine learning methods used for forest fire prediction, aiming to identify
15
research gaps and recent advancements. Selecting the best model is challenging due to
algorithm variations, but tailoring methods to specific forest characteristics enhances
predictive accuracy. [9]
2.1.6 A Survey of Machine Learning Algorithms Based Forest Fires Prediction and
Detection Systems, 2020
Author: Faroudja Abid
Forest fires pose significant environmental threats, annually consuming millions of hectares
worldwide, leading to economic, ecological, and human losses. Predicting and detecting these
fires is crucial for mitigation. This paper offers an extensive examination of machine
learning-based algorithms used in forest fire prediction and detection systems, emphasizing
the rising incorporation of emerging technologies such as artificial intelligence for process
automation. It introduces the forest fire issue, reviews various prediction and detection
methods, and discusses studies evaluating factors influencing fire occurrence and risk. The
paper presents and discusses key findings and challenges from each study. [10]
2.1.7 Role of Machine Learning Algorithms in Forest Fire Management, 2021
Authors: Muhammad Arif , Khloud K Alghamdi , Salma A Sahel , Samar O Alosaimi ,
Mashael E Alsahaft, Maram A Alharthi and Maryam Arif
Given the rising global concern over forest fires amid climate change, accurate prediction is
imperative. This paper aims to summarize recent advancements in forest fire prediction,
detection, spread rate estimation, and burnt area mapping. Additionally, it highlights the risks
posed by smoke emissions to public health and ecosystems. By leveraging ML algorithms,
this review explores opportunities to enhance forest fire management decision-making,
ultimately contributing to cost savings and environmental health improvement. [11]
2.1.8 Forest Fire Prediction Using Machine Learning Techniques, 2021
Authors: T Preeti, Suvarna Kanakaraddi, Aishwarya Beelagi, Sumalata Malagi, Aishwarya
Sudi
Forest fire prediction is necessary for control as to its environmental impact. Detection
algorithms, often leveraging satellite imagery, are pivotal. This study proposes a system
utilizing meteorological parameters for prediction, employing Random Forest Regression
with Hyperparameter tuning for accuracy enhancement. Comparative analysis encompasses
Decision Trees, Random Forests, Support Vector Machines, and Artificial Neural Networks.
Hyperparameter tuning produces promising outcomes, with MAE at 0.03, MSE at 0.004, and
RMSE at 0.07. [12]
2.1.9 Forest Fires Detection Using Machine Learning Techniques,2020
Authors: Ahmed M. Elshewey , Amira. A. Elsonbaty
Currently, forest fires represent a significant global issue, prompting the exploration of
machine learning regression methods to forecast regions prone to fire outbreaks. This study
utilizes a dataset obtained from the UCI machine learning repository, containing data on
climate and physical factors from Montesinos park in Portugal. Three regression algorithms
—linear regression, ridge regression, and lasso regression—are applied to a dataset
16
comprising 517 entries with 13 features per entry. The dataset is evaluated in two versions:
one including all features and another with 70% of the features. Training involves 70% of the
dataset, with the remaining 30% reserved for testing. Findings indicate that linear regression
outperforms ridge regression and lasso regression algorithms in terms of accuracy.. [13]
2.1.10 Predicting wildfires in Algerian forests using machine learning models, 2023
Authors: Abdelhamid Zaidi
Algeria faces significant wildfire challenges with lasting impacts. Early detection is crucial,
but limited datasets hinder prediction methods. Using recent data from Bejaia and Sidi Bel-
Abbes in 2012, PCA reduced variables to six while retaining 96.65% variance. ANN
outperformed other classifiers in accuracy, precision, and recall, achieving 0.967 ± 0.026
accuracy and 0.971 ± 0.023 F1-score. Feature importance analysis highlighted RH, DC, and
ISI as significant predictors in the ANN model. [14]
We're focused on predicting the Fire Weather Index (FWI) for Algeria's Béjaïa and Sidi Bel-
abbes regions. FWI, crucial for assessing fire risk, relies on meteorological factors. Our goal
is to deploy a regression model understanding how weather conditions (temperature,
humidity, wind speed, rainfall) and FWI components (FFMC, DMC, DC, ISI, BUI) influence
FWI values. This model will aid in proactive fire hazard assessment and prevention strategies
for these areas.
We'll train regression models with historical data from June to September 2012, comprising
meteorological information and FWI values. These models will then forecast FWI for
upcoming days, considering anticipated weather conditions.
Our ultimate goal is to develop precise models that will enable us to predict the Fire Weather
Index, which will be useful for these regions of Algeria's fire management and prevention
plans.
CHAPTER 3
METHODOLOGY
Data collection and Preprocessing:
Gather a diverse and representative dataset that includes a wide range of Relative humidity,
temperature, fire index and wind speed conditions.
Preprocess the data to standardize data quality, do feature selection and removal of null
values.
17
Exploratory Data Analysis:
Employ state-of-the-art techniques in machine learning to extract discriminative features
from dataset.
Explore machine learning architectures and study about regression and its models such as
Linear, Lasso, Ridge and Elastic net for learning hierarchical representations for accurate
prediction of forest fires .
CHAPTER – 4
18
Case 2: Box Plots To understand Effect of Standard Scaler
19
Case 4: Lasso Regression
20
Case 6 : Elastic net Regression
21
Case 8: Fire Analysis of region2
Chapter - 5
22
Conclusion And Future work
5.1 Conclusion
In both instances, the R2 Score reflects a commendable level of accuracy, indicating that
both models adeptly explain the variance in the data and yield precise predictions.
Upon examining the Mean Absolute Error (MAE), Linear Regression exhibits a marginally
lower value (0.482) compared to Ridge Regression (0.498).
Despite the slight advantage of Linear Regression in terms of prediction accuracy as
showed by the R2 Score and MAE, we opt for utilizing the Ridge Regression model due to
its efficacy in addressing overfitting concerns.
5.2Future Work
While efforts have been made in forest fire prediction and management, there remains high
opportunity for advancement. Future endeavours may centre on enhancing predictive
accuracy, embracing emerging technologies, and nurturing interdisciplinary collaborations
to tackle the multifaceted challenges presented by forest fires.
Integration of Real-Time Sensor Data:
By utilizing real-time sensor data from IoT devices, remote sensors, and weather
stations, environmental conditions can be continuously tracked. Early detection of fire
outbreaks may also be made possible, enabling timely mitigation and control
behaviour.
Designing a website: Creating an ample website for the same can give easy access
to reliable data and timely information to the users regarding the forest fire
breakouts and preventions.
23
REFERENCES
[1] “(23) Data Science vs ML,AI,DL. Differences and Why It Matters | LinkedIn.” Accessed: May 05,
2024. [Online]. Available: https://www.linkedin.com/pulse/data-science-vs-mlaidl-differences-why-
matters-sai-nithya-akuthota/
[2] S. Raschka, “STAT 451: Introduction to Machine Learning Lecture Notes,” 2020, Accessed: May 05,
2024. [Online]. Available: http://stat.wisc.edu/∼sraschka/teaching/stat451-fs2020/
[3] “A Refresher on Regression Analysis.” Accessed: May 05, 2024. [Online]. Available:
https://hbr.org/2015/11/a-refresher-on-regression-analysis
[4] “Introduction To lasso Regression, Effects And Its Limitations - Pianalytix - Build Real-World Tech
Projects.” Accessed: May 05, 2024. [Online]. Available: https://pianalytix.com/introduction-to-lasso-
regression-effects-and-limitations-of-lasso-regression/
[5] V. G. R, A. R, J. Sameerna, A. Basha, and S. Gali, “Forest Fire Prediction Using Machine Learning,”
Int J Res Appl Sci Eng Technol, vol. 11, no. 5, pp. 792–797, May 2023, doi:
10.22214/IJRASET.2023.51496.
[6] “(PDF) Forest Fires Detection Using Machine Learning Techniques.” Accessed: May 05, 2024.
[Online]. Available:
https://www.researchgate.net/publication/344462171_Forest_Fires_Detection_Using_Machine_Learn
ing_Techniques
[7] “(PDF) ‘Forest Fire Prediction’ Submitted by Saurab Bhattarai.” Accessed: May 05, 2024. [Online].
Available:
https://www.researchgate.net/publication/371640298_Forest_Fire_Prediction_Submitted_by_Saurab_
Bhattarai
[8] A. L. Latifah, A. Shabrina, I. N. Wahyuni, and R. Sadikin, “Evaluation of Random Forest model for
forest fire prediction based on climatology over Borneo,” 2019 International Conference on
Computer, Control, Informatics and its Applications: Emerging Trends in Big Data and Artificial
Intelligence, IC3INA 2019, pp. 4–8, Oct. 2019, doi: 10.1109/IC3INA48034.2019.8949588.
[9] R. Alkhatib, W. Sahwan, A. Alkhatieb, and B. Schütt, “A Brief Review of Machine Learning
Algorithms in Forest Fires Science,” Applied Sciences 2023, Vol. 13, Page 8275, vol. 13, no. 14, p.
8275, Jul. 2023, doi: 10.3390/APP13148275.
[10] F. Abid, “A Survey of Machine Learning Algorithms Based Forest Fires Prediction and Detection
Systems,” Fire Technol, vol. 57, no. 2, pp. 559–590, Mar. 2021, doi:
10.1007/S10694-020-01056-Z/METRICS.
[11] A. Muhammad et al., “Role of Machine Learning Algorithms in Forest Fire Management: A Literature
Review,” Journal of Robotics and Automation, vol. 5, no. 1, Feb. 2021, doi: 10.36959/673/372.
[12] “Forest Fire Prediction Using Machine Learning Techniques | Request PDF.” Accessed: May 05,
2024. [Online]. Available:
https://www.researchgate.net/publication/371651902_Forest_Fire_Prediction_Using_Machine_Learni
ng_Techniques
[13] “(PDF) Forest Fires Detection Using Machine Learning Techniques.” Accessed: May 05, 2024.
[Online]. Available:
https://www.researchgate.net/publication/344462171_Forest_Fires_Detection_Using_Machine_Learn
ing_Techniques
24
[14] A. Zaidi, “Predicting wildfires in Algerian forests using machine learning models,” Heliyon, vol. 9,
no. 7, p. e18064, Jul. 2023, doi: 10.1016/J.HELIYON.2023.E18064.
25
26