Forest Fire Prediction: Harnessing Data for early detection and


Project Report Submitted in Partial Fulfilment of the Requirements for the Degree of

Bachelor of Technology

Computer Science

Submitted by:
Tanul Khare: (Roll No. 200102420)
Shubhangi Maurya: (Roll No. 200102409)
Sudhanshu Kumar: (Roll No. 210102524)

Under the Supervision of:

Mrs. Bhavana Srivastava
Assistant Professor

DIT University, Dehradun

January – May 2024

We declare that this written submission represents my ideas in my own words and where
others' ideas or words have been included, we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in my submission. I understand that any violation of the above will be cause for disciplinary
action by the University and can also evoke penal action from the sources which have thus
not been properly cited or from whom proper permission has not been taken when needed.
The plagiarism check report is attached at the end of this document.

Name of the Student: Tanul Khare Signature and Date: 07/05/24

Name of the Student: Shubhangi Maurya Signature and Date: 07/05/24

Name of Student: Sudhanshu Kumar Signature and Date: 07/05/24


Forest Fire Prediction: Harnessing Data for early detection and Prevention..................................1
LIST OF ABBREVIATIONS................................................................................................................4
CHAPTER – 1........................................................................................................................................5
Forest Fire Prediction............................................................................................................................5
1.1 INTRODUCTION..........................................................................................................5
1.2 APPLICATION...............................................................................................................5
1.3 TECHNIQUES................................................................................................................6
1.4 MACHINE LEARNING...............................................................................................7
1.5 REGRESSION................................................................................................................9
1.6 ALGORITHMS............................................................................................................10
CHAPTER – 2......................................................................................................................................15
PROJECT ANALYSIS........................................................................................................................15
2.1 LITERATURE REVIEW............................................................................................15
2.2 PROBLEM STATEMENT..........................................................................................16
CHAPTER – 3......................................................................................................................................18
CHAPTER – 4......................................................................................................................................19
RESULTS AND DISCUSSION...........................................................................................................19
CHAPTER - 5……………………………………………………………………………………….23
CONCLUSION AND FUTURE WORK…………………………………………………………..23
PLAGIARISM CHECK REPORT…….…………………………………………………26


RH Relative Humidity
WS Wind speed
FFMC Fine Fuel Moisture Code
DMC Duff Moisture Code
DC Drought Code
ISI Initial Spread Index
BUI Buildup Index
FWI Fire Weather Index
TEMP Temperature


A forest fire, marked by its rapid spread across wooded landscapes and consumption of
vegetation and combustible materials, may stem from natural activities like lightning strikes
or human activities such as campfires, discarded cigarettes, or deliberate acts of arson.
These fires pose grave risks to human settlements, wildlife habitats, ecosystems, and air
quality, potentially resulting in community displacement, property devastation, biodiversity
loss, and even loss of human lives.
Forecasting wildfires involves utilizing a range of data sources and analytical methods. This
includes considering topographic elements like vegetation cover and slope steepness,
alongside temperature, humidity, and wind speed. As machine learning algorithms look over
this data, prediction accuracy improves gradually.
The primary objective is to integrate decision support systems and operational workflows
with predictive insights to enable proactive mitigation of fire risks. Early warning systems
and real-time monitoring play vital roles in alerting stakeholders to new threats

Gender prediction analysis finds its application in various sectors such as:

1. Wildfire Management: Predictive models help firefighting agencies anticipate fire

behaviour, allocate resources effectively, and implement pre-emptive measures to
mitigate fire risks.
2. Environmental Conservation: Early warning systems enable conservationists to
protect vulnerable ecosystems, prevent habitat destruction, and facilitate post-fire
rehabilitation efforts.
3. Public Safety: Predictive analytics provide communities with timely alerts and
evacuation recommendations, reducing the risk of casualties and property damage
during wildfire events.
4. Land Use Planning: Predictive insights inform land management decisions, zoning
regulations, and urban development strategies to minimize exposure to wildfire
hazards and enhance community resilience.
5. Research and Education: Forest fire prediction fosters scientific inquiry, innovation,
and interdisciplinary collaboration, advancing our understanding of fire ecology,
climate change impacts, and human-wildlife interactions.



The practice of applying methods like statistical analysis, machine learning, and data mining
to big and complex databases in order to extract meaningful knowledge and insights is known
as data science. It involves gathering, organizing, processing, and analyzing data in order to
identify relationships, trends, and patterns that might influence choices and spur creativity in
a variety of sectors. Businesses can use data science to turn unorganized information into
insightful knowledge that helps them solve problems, predict future occurrences, and
optimize operations.

(Fig 1.1) Hierarchy showing the different fields of Data Science [1]


 Linear Regression
 Logistic Regression.
 Elastic net Regression
 Decision Tree.
 SVM (Support Vector Machine) Algorithm.
 KNN (K- Nearest Neighbours) Algorithm
 Random Forest Algorithm.


In the artificial intelligence field of machine learning, systems learn from data.
identify patterns, and make decisions without explicit programming. It involves algorithms
that gain expertise over time and become more efficient. supervised learning, unsupervised
learning, and reinforcement learning are a few of the various types, each with its own
techniques and applications. [2]


Supervised learning is the subcategory of machine learning that focuses on learning a
classification, or regression model, that is, learning from labelled training data (i.e., inputs
that also contain the desired outputs or targets; basically, “examples” of what we want to

Types of Supervised Learning:

1. Classification: It is a Supervised Learning task where output is having defined labels
(discrete value). In binary classification, model predicts either 0 or 1; yes or no but in case of
multi class classification, model classifies more than one class. Example: Gmail classifies
mails in several classes like social, promotions, updates, forum.

2. Regression: It is a Supervised Learning task where output is having continuous value. The
goal here is to predict a value as much closer to actual output value as our model can and then
evaluation is done by calculating error value. The smaller the error the greater the accuracy of
our regression model.


The process of training a machine with unlabelled data and allowing the algorithm function
on it unassisted is known as unsupervised learning. In this case, the machine's goal is to
group data using similarities, patterns, and differences without need any prior data training.
In contrast with supervised learning, the lack of a teacher suggests the machine will not
receive any kind of instruction. Therefore, the machine's ability to determine for itself the
hidden structure in unlabelled data is limited.

Types of Supervised Learning:

1. Clustering: Finding the natural groupings in the data—like classifying clients based
on their purchasing patterns—is known as a clustering problem.

2. Association: Identifying rules that describe significant portions of your data, like
"people who buy X also tend to buy Y," is known as an association rule learning


Reinforcement learning addresses the question of how a system that senses and acts in its
environment can learn to choose optimal actions to achieve its goals. This very generic
problem covers tasks such as learning to control a mobile robot, learning to optimize
operations in factories, and learning to play board games. Each time the system performs an
action in its environment, a trainer may provide a reward or penalty to indicate the
desirability of the resulting state. The task of the agent is to learn to choose sequences of
actions that produce the greatest cumulative reward.

Regression is a valuable and widely used tool in the world of data science and machine
learning. It empowers us to explore and predict the connections between multiple factors. In
simpler terms, regression allows us to uncover a mathematical equation that links one factor)
with one or more other factors.


1. Prediction: Let's say you have information about the price of houses in a
neighbourhood and want to know the price of a new house. Regression helps you
make a prediction based on the features of the new house, such as its size, number of
rooms, and location.
2. Understanding Relationships: Understanding how various factors affect each other
becomes simpler with the use of regression. You might be keen on learning, for
instance, how exam results are affected by study time. A significant relationship
between study time and scores can be identified via regression analysis.
3. Identifying Important Factors: In complex situations with many variables,
regression helps us identify which factors have a valued impact on the outcome. It
helps us separate the essential factors from the ones that don't matter much.
4. Decision Making: Organizations and businesses use regression to make informed
decisions. For instance, a company might use regression to predict customer demand
for a product, helping them plan their production and inventory efficiently.


In order for regression models to function, they must first estimate the relationship among
distinct variables, occasionally referred to as indicators, and a dependent variable, or the
variable we wish to forecast. The amount and direction of each independent variable's
influence on the dependent variable is represented by the coefficients that the model
estimates for each variable. [3]

The model can be trained on historical data and then, by applying the learnt coefficients to
the values of the independent variables, be used to forecast the values of the dependent
variable for new or unseen data. How effectively the model explains the underlying
relationship between the variables and how representative the training data is of the
population to which the model is intended to generalize determine how accurate the
predictions are.



A linear equation is fitted to the observed data using the statistical technique of linear
regression to represent the connection among independent variables (predictors) and a
dependent variable (outcome). In statistics and machine learning, it is among the most
straightforward and popular regression approaches.

1.6.2What Makes a Linear Regression?

 Dependent Variable (Y):

Also known as the target variable, the dependent variable is the variable being
predicted or explained by the model. It is denoted as 𝑌 and is typically continuous or
quantitative in nature.
 Independent Variables (X):
The variables that are used to forecast or explain the fluctuation in the dependent
variable are referred to as independent variables, predictors, or features. They can be
continuous, categorical, or a combination of the two; they are called X1, X2,... Xn.
 Linear Equation:
The relationship between the independent variables and the dependent variable is
modelled using a linear equation of the form: 𝑌 = 𝛽 0 + 𝛽 1 𝑋 1 + 𝛽 2 𝑋 2 + … + 𝛽
𝑛 𝑋 𝑛 + 𝜀 Y=β 0 +β 1 X 1 +β 2 X 2 +…+β n X n +ε.

1.6.3 Applications of Linear Regression

 Prediction: Linear regression is commonly used for making assumptions based on

the relationship between independent and dependent variables. For example,
predicting house prices based on features such as square footage, number of
bedrooms, and location.

 Inference: Linear regression can also be employed for inference, where the goal is to
understand the relationship between variables and interpret the coefficients.

 Control and Optimization: Linear regression can be used to control or optimize

processes by identifying major factors that predict an outcome.

1.6.4 Limitations of Linear Regression

 Assumption of Linearity: In complex, nonlinear relationships found in real-life

scenarios, the linear relationship between variables that a regression model assumes
may not be true always.
 Sensitivity to Outliers: Outliers have the potential to provide skewed results by
disproportionately influencing model parameters and predictions in linear regression.
 Limited Flexibility: Not as flexible as more complicated models, linear regression
has a harder time accurately capturing nonlinear relationships and interactions
between variables.


Lasso (Least Absolute Shrinkage and Selection Operator) regression is a regularization
technique used in linear regression to prevent overfitting and select the most important
features by imposing a penalty on the absolute size of the coefficients. It adds a
regularization term to the ordinary least squares (OLS) objective function, encouraging
sparse solutions where some coefficients are shrunk to zero, effectively performing feature
selection. [4]

1.6.6 What Makes a Lasso Regression?

 Dependent Variable (Y):
Similar to linear regression, Lasso involves predicting a dependent variable 𝑌 Y
based on various independent variables 𝑋 1 , 𝑋 2 , … , 𝑋 𝑛 X 1 ,X 2 ,…,X n .
 Independent Variables (X):
These are the predictor variables used to explain the changes in the dependent
 Regularization Parameter (λ):
The regularization parameter 𝜆 λ controls the strength of the penalty applied to the

1.6.7 Applications of Lasso Regression

 Feature Selection: Lasso efficiently executes feature selection, identifying the most
significant predictors while discarding superfluous or irrelevant variables by
decreasing some coefficients to zero.

 Model Interpretation: Lasso regression can assist in enhancing the model's
interpretability by highlighting the most useful variables and the related coefficients.

 Prediction with Regularization: Lasso regression reduces overfitting, which enhances

the model's generalization ability, especially when working with noisy or
multicollinear data.

1.6.8 Limitations of Lasso Regression

 Lasso regression gets into trouble when the predictors are more than the number of

 If there are two or more highly collinear variables then lasso regression will select one
of them randomly which is not a good technique in data interpretation.


Ridge regression is like a tool used in linear regression to prevent the model from fitting too
closely to the data and becoming too sensitive to the relationship between variables. It does
this by adding a penalty to the model that encourages smaller coefficients, effectively making
them shrink towards zero. This helps to prevent the model from overemphasizing certain
variables and gives more stable and reliable predictions. Unlike some other methods, Ridge
regression doesn't force coefficients to become exactly zero, but rather nudges them closer to

1.6.10 What Makes a Ridge Regression?

 Dependent Variable (Y):
Ridge regression predicts a dependent variable (Y) based on one or more independent
variables (X 1, X 2,....., 𝑋 𝑛 X 1, X 2,....., X n ). This process is similar to that of
linear regression.

 Independent Variables (X):

The variables that are utilized to account for changes in the dependent variable are
known as independent variables (X).

 Regularization Parameter (λ):

The regularization parameter (𝜆 λ) regulates the extent to which the coefficients are

1.6.11 Applications of Ridge Regression

 Collinearity Reduction:
By decreasing the coefficients of the predictor variables, ridge regression
considerably minimizes the effect of multicollinearity (high correlation).

 Prediction with Regularization:

Ridge regression reduces overfitting, thus improving the model's generalization
performance, particularly when applying it to high-dimensional datasets or highly
correlated predictors.

 Parameter Estimation:
By reducing the variance of the coefficients, ridge regression provides estimations of
the coefficients that are more reliable, especially if there are more variables than data.

1.6.12 Limitations of Ridge Regression

 Biased Coefficient Estimates:
Ridge regression has a tendency to decrease all coefficients, even the significant
predictors' coefficients, towards zero. This can tend to bias in the coefficient
estimations, especially if the actual coefficients are not small.

 Difficulty in Interpretation:
Because the coefficients in ridge regression are decreased towards zero and might not
accurately represent the significance of the predictors, interpreting the results can
become more difficult.


Elastic Net regression combines Lasso and Ridge penalties in linear regression to
manage overfitting and multicollinearity. It balances feature selection and coefficient
stabilization by adjusting regularization parameters.

1.6.14 What Makes an Elastic net Regression?

Dependent Variable (Y): The outcome we wish to predict is the dependent variable, just
like in other regression methods.
Independent Variables (X): As in linear regression, these are the predictor variables that are
used to explain the variation in the dependent variable.
Regularization Parameters (α and λ): Two hyperparameters, α and λ, are introduced by
elastic Net regression to regulate the regularization:

1.6.15 Applications of Elastic Net Regression

 Feature Selection and Model Interpretability:

elastic Net regression enables continuous feature selection and regularization by
combining the advantages of Lasso and Ridge regression. With high-dimensional
datasets, it is especially helpful because feature selection is crucial to the
interpretability of the model.

 Multicollinearity Handling:
By selecting and shrinking sets of correlated variables together, elastic Net regression
effectively handles multicollinearity (high correlation) among predictor variables.

1.6.16 Limitations of Elastic Net Regression

 Complexity in Hyperparameter Tuning :

elastic Net regression adds two more hyperparameters (α and λ), which need to be
carefully adjusted for best results.
Choosing suitable values for α and λ might be difficult and could need a lot of cross-

 Computational Overhead:
Elastic Net regression requires addressing a more complex optimization task than
Lasso or Ridge regression, which can lead to greater computing cost, particularly for
large datasets.




2.1.1 Forest Fire Prediction Using Machine Learning

Authors: Virupaksha Gouda R, Anoop R, Joshi Sameerna, Arif Basha, Sahana Gali

The existing systems use various technology like Machine learning techniques and Artificial
Intelligence and Wireless network utilized for collecting 24- hour weather data continuously,
which provides a higher chance to reflect perfectness of the status of forest environment.
Depending on those system, we can decide which days have the highest possibility of
catching a forest fires and danger and paid special attention to prevent forest fire for forest
guards. [5]
2.1.2 Forest fire Detection Using Machine Learning Technique”,2020
Authors: C. Amira. A. Elsonbaty, Ahmed M. Elshewey
This research presents the use of machine learning regression approaches for the prediction of
forest fire-prone zones. The data set utilized in this paper, which includes the climate and
physical characteristics of the Montesinos park in Portugal, is available in the UCI machine
learning repository. Along with these machine learning techniques, the research suggests
several more, including the ridge regression, lasso regression algorithm, and linear regression
with a data set of 13 characteristics and 517 entries per row. Comparing the accuracy of the
ridge regression and lasso regression techniques, the linear regression algorithm yields higher
accuracy.. [6]

2.1.3 Forest Fire Prediction

Author: Saurab Bhattarai
A variety of machine learning (ML) algorithms, including as decision trees, random forests,
support vector machines (SVM), artificial neural networks (ANN), and ensemble approaches,
have been used to predict forest fires. A comparative study conducted by Di Tommaso et al.
(2018) compared the performance of various algorithm and found that random forests and
SVM achieved high accuracy in predicting forest fire occurrence. [7]

2.1.4 Evaluation of Random Forest model for forest fire prediction based on climatology
over Borneo” 2019
Authors: E. Ayu Shabrina, Intan N. Wahyuni, Rifika Sadikin, Arninda L. Latifah
Forest fires are threatened by human activities, ecosystem and climate processes, but in
Borneo only variable of climate can be quantified . The goal of the research is to determine
how well the random forest model predicts forest fires by utilizing climate variables and
satellite data of burned regions as input. It is anticipated that forest fire prediction would
lessen the impact of forest fires going forward. By means of a yearly and geographical
variability analysis, it was found that the random forest model, incorporating all selected
climate variables, effectively represents forest fire events across the Borneo region of
Indonesia. [8]
2.1.5 A Brief Review of Machine Learning Algorithms in Forest Fires Science, 2023
Authors: Ramez Alkhatib ,Wahib Sahwan ,Anas Alkhatieb and Brigitta Schütt
As forest fires become more frequent globally, early prediction is crucial. Artificial
intelligence, particularly machine learning, is vital for forecasting and assessing fire risk. This
article reviews machine learning methods used for forest fire prediction, aiming to identify

research gaps and recent advancements. Selecting the best model is challenging due to
algorithm variations, but tailoring methods to specific forest characteristics enhances
predictive accuracy. [9]
2.1.6 A Survey of Machine Learning Algorithms Based Forest Fires Prediction and
Detection Systems, 2020
Author: Faroudja Abid
Forest fires pose significant environmental threats, annually consuming millions of hectares
worldwide, leading to economic, ecological, and human losses. Predicting and detecting these
fires is crucial for mitigation. This paper offers an extensive examination of machine
learning-based algorithms used in forest fire prediction and detection systems, emphasizing
the rising incorporation of emerging technologies such as artificial intelligence for process
automation. It introduces the forest fire issue, reviews various prediction and detection
methods, and discusses studies evaluating factors influencing fire occurrence and risk. The
paper presents and discusses key findings and challenges from each study. [10]
2.1.7 Role of Machine Learning Algorithms in Forest Fire Management, 2021
Authors: Muhammad Arif , Khloud K Alghamdi , Salma A Sahel , Samar O Alosaimi ,
Mashael E Alsahaft, Maram A Alharthi and Maryam Arif
Given the rising global concern over forest fires amid climate change, accurate prediction is
imperative. This paper aims to summarize recent advancements in forest fire prediction,
detection, spread rate estimation, and burnt area mapping. Additionally, it highlights the risks
posed by smoke emissions to public health and ecosystems. By leveraging ML algorithms,
this review explores opportunities to enhance forest fire management decision-making,
ultimately contributing to cost savings and environmental health improvement. [11]
2.1.8 Forest Fire Prediction Using Machine Learning Techniques, 2021
Authors: T Preeti, Suvarna Kanakaraddi, Aishwarya Beelagi, Sumalata Malagi, Aishwarya
Forest fire prediction is necessary for control as to its environmental impact. Detection
algorithms, often leveraging satellite imagery, are pivotal. This study proposes a system
utilizing meteorological parameters for prediction, employing Random Forest Regression
with Hyperparameter tuning for accuracy enhancement. Comparative analysis encompasses
Decision Trees, Random Forests, Support Vector Machines, and Artificial Neural Networks.
Hyperparameter tuning produces promising outcomes, with MAE at 0.03, MSE at 0.004, and
RMSE at 0.07. [12]
2.1.9 Forest Fires Detection Using Machine Learning Techniques,2020
Authors: Ahmed M. Elshewey , Amira. A. Elsonbaty
Currently, forest fires represent a significant global issue, prompting the exploration of
machine learning regression methods to forecast regions prone to fire outbreaks. This study
utilizes a dataset obtained from the UCI machine learning repository, containing data on
climate and physical factors from Montesinos park in Portugal. Three regression algorithms
—linear regression, ridge regression, and lasso regression—are applied to a dataset

comprising 517 entries with 13 features per entry. The dataset is evaluated in two versions:
one including all features and another with 70% of the features. Training involves 70% of the
dataset, with the remaining 30% reserved for testing. Findings indicate that linear regression
outperforms ridge regression and lasso regression algorithms in terms of accuracy.. [13]
2.1.10 Predicting wildfires in Algerian forests using machine learning models, 2023
Authors: Abdelhamid Zaidi
Algeria faces significant wildfire challenges with lasting impacts. Early detection is crucial,
but limited datasets hinder prediction methods. Using recent data from Bejaia and Sidi Bel-
Abbes in 2012, PCA reduced variables to six while retaining 96.65% variance. ANN
outperformed other classifiers in accuracy, precision, and recall, achieving 0.967 ± 0.026
accuracy and 0.971 ± 0.023 F1-score. Feature importance analysis highlighted RH, DC, and
ISI as significant predictors in the ANN model. [14]


We're focused on predicting the Fire Weather Index (FWI) for Algeria's Béjaïa and Sidi Bel-
abbes regions. FWI, crucial for assessing fire risk, relies on meteorological factors. Our goal
is to deploy a regression model understanding how weather conditions (temperature,
humidity, wind speed, rainfall) and FWI components (FFMC, DMC, DC, ISI, BUI) influence
FWI values. This model will aid in proactive fire hazard assessment and prevention strategies
for these areas.
We'll train regression models with historical data from June to September 2012, comprising
meteorological information and FWI values. These models will then forecast FWI for
upcoming days, considering anticipated weather conditions.
Our ultimate goal is to develop precise models that will enable us to predict the Fire Weather
Index, which will be useful for these regions of Algeria's fire management and prevention


Data collection and Preprocessing:
Gather a diverse and representative dataset that includes a wide range of Relative humidity,
temperature, fire index and wind speed conditions.
Preprocess the data to standardize data quality, do feature selection and removal of null

Exploratory Data Analysis:
Employ state-of-the-art techniques in machine learning to extract discriminative features
from dataset.
Explore machine learning architectures and study about regression and its models such as
Linear, Lasso, Ridge and Elastic net for learning hierarchical representations for accurate
prediction of forest fires .

Model development and training:

Design and train all respective regression models, plot pie charts, box plots and density plots
for all the features given in the dataset.
Explore various model structures through experimentation and implement regression
techniques to maximize accuracy and robustness.

Evaluation and Validation:

Evaluate the working of the trained models using appropriate metrics such Mean Absolute
Error and R2-score.
Divide the dataset into training and testing data and deploy accordingly to assess their
predictive potential.



We implemented multiple regression models to determine which one would yield the most
accurate predictions. We considered metrics such as temperature, humidity, and wind speed
to improve the accuracy of our forecasts regarding forest fire propagation.

Case 1: To Check for Multicollinearity

Case 2: Box Plots To understand Effect of Standard Scaler

Case 3: Linear Regression Model

Case 4: Lasso Regression

Case 5: Ridge Regression model

Case 6 : Elastic net Regression

Case 7: Fire Analysis of region1

Case 8: Fire Analysis of region2

Chapter - 5

Conclusion And Future work

5.1 Conclusion
In both instances, the R2 Score reflects a commendable level of accuracy, indicating that
both models adeptly explain the variance in the data and yield precise predictions.
Upon examining the Mean Absolute Error (MAE), Linear Regression exhibits a marginally
lower value (0.482) compared to Ridge Regression (0.498).
Despite the slight advantage of Linear Regression in terms of prediction accuracy as
showed by the R2 Score and MAE, we opt for utilizing the Ridge Regression model due to
its efficacy in addressing overfitting concerns.

5.2Future Work
While efforts have been made in forest fire prediction and management, there remains high
opportunity for advancement. Future endeavours may centre on enhancing predictive
accuracy, embracing emerging technologies, and nurturing interdisciplinary collaborations
to tackle the multifaceted challenges presented by forest fires.
 Integration of Real-Time Sensor Data:
By utilizing real-time sensor data from IoT devices, remote sensors, and weather
stations, environmental conditions can be continuously tracked. Early detection of fire
outbreaks may also be made possible, enabling timely mitigation and control

 Designing a website: Creating an ample website for the same can give easy access
to reliable data and timely information to the users regarding the forest fire
breakouts and preventions.

