Hazardous Asteroid Prediction

AN PROJECT REPORT
On
HAZARDOUS ASTEROID PREDICTION
submitted in partial fulfillment of the requirement
for the award of the Degree

of
Bachelor of
Technology
In
Information Technology
By
Eknath Janardan Dake(1930331246070)
Ritik Arun Choudhari(1930331246071 )
Amey Nandkishor Shende(1930331246072)
Divyanshu Naresh Bansod(1930331246075)
Under the Guidance of
Prof S. V. Bharad
Department of Information Technology
Dr. Babasaheb Ambedkar Technological University,
Lonere – 402103, Tal– Mangaon, Dist- Raigad (MH) INDIA
2022 – 2023
DECLARATION
We undersigned hereby declare that the Project report” Hazardous Asteroid Prediction”,
submitted for partial fulfillment of the requirements for the award of degree of B. Tech. of
the Dr. Babasaheb Ambedkar Technological University, Lonere is a bonafide work done by
us under supervision of Prof S. V. Bharad. This submission represents our ideas in my own
words and where ideas or words of others have been included, we have adequately and
accurately cited and referenced the original sources. We also declare that we have adhered
to ethics of academic honesty and integrity and have not misrepresented or fabricated
any data or idea or fact or source in my submission. We understand that any violation of
the above will be a cause for disciplinary action by the University and can also evoke penal
action from the sources which have thus not been properly cited or from whom proper
permission has not been obtained. This report has not been previously formed the basis
for the award of any degree, diploma or similar title of any other University.
Eknath Janardan Dake(1930331246070)

Ritik Arun Chaudhari (1930331246071)
Amey Nandkishor Shende(1930331246072)
Divyanshu Naresh Bansod(1930331246075)
DEPARTMENT OF INFORMATION TECHNOLOGY
DR.BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY
CERTIFICATE
This is to certify that the report entitled “Hazardous Asteroid Prediction” submitted by Eknath
Dake(1930331246070), Ritik Arun Choudhari (1930331246071),Divyanshu Bansod(1930331246075)
and Amey Shende(1930331246072) to the Dr. Babasaheb Ambedkar Technological University in
partial fulfillment of the requirements for the award of the Degree of Bachelor of Technology in
Information Technology is a bonafide record of the project work carried out by them under my
guidance and supervision. This report in any form has not been submitted to any other University or
Institute for any purpose.
Dr. S. R. Sutar
Head of Information Technology

Department
Guide: Prof. S. V. Bharad

ACKNOWLEDGEMENT
We take this opportunity to express our gratitude and deep regard to our guide professor Prof. S.V.
Bharad for their exemplary guidance, monitoring and constant encouragement throughout the course
of the thesis. The blessings, help and guidance given by him time to time shall carry me a long way in
the journey of life on which I am about to embark.
We are highly indebted to department of Information Technology, Dr. Babasaheb Ambedkar

Technological University for their guidance and constant supervision as well as for providing
necessary information regarding the project and also for their support in completing the project.
Lastly, we would like to thank the almighty our parents for their moral support and friends with whom
we shared our day-to-day experience and received lots of suggestions that improved our quality of
work.
Eknath Dake(1930331246070)
Ritik Choudhari(1930331246071)
Amey Shende(1930331246072)
Divyanshu Bansod(1930331246075)
Abstract
An impact by an asteroid is one of the very few natural disasters that could be mitigated or even
entirely prevented if accurate predictions are made early enough. Classifying Near-Earth Asteroid
population to identify Potentially Hazardous Asteroids well before an impact is one of the most
important problems that need to be solved to avoid humanity from facing the same fate as that of the
dinosaurs 65 million years ago.In this project we will use some algorithms namely XGBoost, linear
regression algorithm. XGBoost is a decision-tree-based ensemble Machine Learning algorithm that
uses a gradient boosting framework. This study presents an approach to classify the Near-Earth
Asteroid Population as potentially hazardous or non-hazardous by allowing machine learning to learn
complex representations that exist in the distribution of available asteroid orbital data. We believe that
the generation of an automatic potentially hazardous asteroid detector would contribute to speed up
the characterization rate of the rapidly growing asteroid data that are made available through advances
of science.
Contents
ACKNOWLEDGEMENT .....................................................................................................................……4
1. INTRODUCTION..................................................................................................................................
2. Literature Review..............................................................................................................................
3. Data....................................................................................................................................................
4. Methodology.....................................................................................................................................
4. Implementation..................................................................................................................................
6. Conclusion.......................................................................................................................................
7. references ………………………………………………………………………………………………………………………………22
1
1. INTRODUCTION
Asteroids are rocky celestial objects, they are the primeval remains of early formation of our
Solar System, which occurred around 4.5 billion years ago. Majority of the asteroids are
present in the space between the orbits of Mars and Jupiter. Earth has been struck many
times by asteroids whose orbits hauled them into the central solar system , aggregately
known as Near Earth Asteroids (NEAs), still a threat to Earth today. Just to depend on our
luck would be complacent after having the intellectuality that humans have developed so
far. The impact of an asteroid could wipe out the human race, like it happened to the
dinosaurs.
Earth is impregnable when it comes to the case of non-hazardous asteroids. The

asteroids which have an absolute magnitude of 22.0 or less or have diameter greater than
about 140 metres and an Earth Minimum Orbit Intersection Distance of 0.05 au (7,479,894
Kilometres) or less are considered as Potentially Hazardous Asteroids (PHA's)i.e it’s orbit has
a chance of collision with Earth and has an adequate scope to cause substantial and
prolonged damage during the occurrence of impact. About 30% of the examined orbits have
MOID less than 0.05 au . In order to forestall a grand-scale turmoil by PHA's we need years
of devising. Accordingly, we necessitate an adequate admonition period to assemble a plan
of action .There are numerous automated solutions to detect the asteroids, such as
Spacewatch's Moving Object Detection Program which uses image processing and NASA's
Goldstone Deep Space Communications uses radar, to divulge particulars such as its shape,
size and whether the asteroid is a single object or a system of one small object revolving
around a larger one.
Also, in near future, we have impending technologies like The Large Synoptic Survey
Telescope which could enhance and calibrate the search for NEO’s with improved accuracy.
Through such methods, an enormous amount of data is collected every minute and it's
difficult to process it all in real time with a high degree of accuracy. To face this challenge,
we apply machine learning and feature selection techniques to scrutinize the data.
Maneuvering machine learning algorithms like Support Vector Machines, XGBoost and
Random Forest have shown promising results in other fields, and might be able to deal with
enormous amounts of data that has to be scrutinized.
2
2. Literature Review
In this paper,the authors tried to make some predictions over the combinations of orbital
parameters for yet undiscovered and potentially hazardous asteroids by machine learning
techniques. For this reason, they have used the Support Vector Machine algorithm with the
kernel RBF. By this approach, the boundaries of the potentially hazardous asteroid groups in
2-D and 3-D can be easily understood. By this algorithm, they have achieved the accuracy
over 90%. In this paper, authors has provided classification of asteroids, which are observed
by VISTA-VHS survey. They have used some statistical methods to classify the 18,265
asteroids. They used a probabilistic method, KNN method and a statistical method to classify
those. Later, they compared the algorithms’ accuracy. They have classified the asteroids into
V, S and A types. They did it on the 18265 asteroids and test set consists of over 6400
asteroids. In this Paper, they have classified the asteroids which are near to the earth and can
easily pass through the orbital of Mars. For this purpose they, have used a perceptron-type
network. And their output class are three, they are Apollo, Amor and Aten. They have also
found that semi-major axis and focal distance are the two important major features, for which
the asteroids can be separable linearly. In this paper, they identified the asteroid family by the
help of the supervised hierarchical clustering method. For this method, they have found the
distance between an asteroid from any reference objects. Later they compared the results with
traditional hierarchical clustering method and found that their result gives 89.5% accuracy,
which is better. By this approach, they found 6 new families and 13 clumps. In this paper,
they used KNN, gradient boosting, decision trees, logistic regression to classify the family of
the asteroids. They have identified main belt asteroids, which are three body resonant. They
have identifies 404 new asteroids. They found that gradient boosting method gives the
accuracy of 99.97% for identification purpose which is maximum among all the methods they
have applied. In this paper, they have also classified the asteroids according to their family
such as Apollo, Amor, Aten etc. They have done it only over the asteroids, who pass the mars
orbit. They can be easily separable by the semi major axis and the focal distances. So, a
perceptron type artificial neural network is enough to classify those three types of asteroids.
In this paper, the author has used random forest techniques to classify the asteroids. They
have done it on the 48642 asteroids. They have classified those asteroids among 8 classes
such as C, X, S, B, D, K, L and V.This division are done according to their SDSS magnitudes
at the wavebands of g, r, I and z. They also reached the higher accuracy compare to the many
proposed methods. In this paper, they used supervised learning algorithm to detect asteroid
from a large dataset. For this case they have used vetted NEOWISE dataset (E. L. Wright et.
Al,2010). According to them, the metrics they have used can be easily done as it can be easily
associated with the extracted sources. They also used the python SKLEARN package. After
3
doing this also gave the report on the reliability, feature set selection, and also the suitability
of the various machine learning algorithms. At the end they also compares the results.
3. Data
3.1 Data collection:
This data is taken from the Kaggle. This is the data consists of 4688 rows and 40 columns.
All the data is from the (http://neo.jpl.nasa.gov/).
3.2 Data Visualization:
Here, for data visualization purpose we mainly use heat map, which is done by the help of the
pearson correlation coefficient. From this we can easily identify the highly correlated columns
and take necessary steps to remove the dependency.
4
Fig 1. Heat map plot
After checking of duplicate values and missing values, we found that the count of hazardous
and non-hazardous asteroids. There are only 755 asteroids are hazardous while remaining
3932 are non-hazardous.
Fig 2. Count plot

5
3.3 Features selection:
For this classification I selected 15 features. They are absolute magnitude, minimum orbit
intersection, Ascending node longitude, orbit uncertainty, perihelion time, inclination, semi
major axis, Anomaly, perihelion arguments, perihelion time, relative velocity, perihelion
distance, eccentricity, aphelion distance and Jupiter Tisserand Invariant. These are the
important features. Here the features are ordered from high to less important. Some of the
less important features are also depended on the other highly important and independent
features such as Jupiter Tisserand Invariant which is dependent on semimajor axis, orbital
eccentricity and inclination.
3.4 Split dataset:
Then, the dataset is divided into test and train set. Test set has 938 data while training set
comprise of 3749 data. Now the whole dataset is suitable for using the machine learning
techniques. I am splitting this by taking the random state value as 10.
6
4. Methodology
For all methods mentioned in this section, except for gradient boosting, the implementations
from scikit-learn (Pedregosa et al., 2011) package were used.
3.1 Classification Methods
Multinomial logistic regression (Bishop, 2006) is a method that models the probability of a
sample belonging to each class by constructing a linear function based on input data and
applying softmax to obtain the probabilities. Specifically, the probability of class i given
sample (feature vector) x is modeled as follows:
where are weight vectors for each class. These vectors are trained during the
learning stage by minimizing the crossentropy between the empirical class label distribution
and the model distribution. The multinomial logistic regression is a relatively simple yet
effective method for modeling the data, very well-studied and popular among statisticians, so
it was a natural choice of the algorithm to be included in our experiment.
3.2 XGB
It is nearly impossible for the traditional algorithms to deal with the immense amount of data
involved with observational astronomy. The XGBoost model comes with preprogrammed
7
multithreading parallel computing, generalisation ability, tree structure, can manage missing
values and such features which coalesce it into a much faster and suitable method to deal with
our data. XG Boost is extensively used in many fields to accomplish some data challenges,
and also it is a highly operative scalable ML system for tree boosting. XGB is among the
efficacious methods due to its features such as easy parallelism and high prediction
accuracy[5].
3.2 SVM
SVM is a memory efficient algorithm as it uses a subset of training points in support vectors.
It is also proven to be effective in high dimensional spaces.SVMs reduce most machine
learning problems to optimization problems and optimization lies at the heart of SVMs .
During the training phase of the algorithm, the SVM reads in a set of data points whose
categories are known. The goal of the algorithm is to be able to separate the data into two
groups when the dataset is plotted in space. To do this, it attempts to draw a hyperplane
between the two clusters of points.
3.3 RANDOM FOREST
The Random Forest is appropriate for high dimensional data modeling because it can handle
missing values and can handle continuous, categorical and binary data. Accurate predictions
and better generalizations are achieved due to utilization of ensemble strategies and random
sampling. Random Forest uses Decision Trees as base classifier. This ensemble learning
method is used for classification and regression of data. Random forest is an ensemble
method which generates accurate results but, on the other hand it is a time consuming method
too as compared with the other techniques .Accurate classification of the dataset can be done
by generating a forest of decision trees (Random Forest algorithm) via supervised learning .
3.2 Feature Selection
To evaluate the importance of individual spectral features for classification of taxonomic

types, we decided to verify whether a successful classification can be achieved using only a
small subset of these features. To this end, we employed the sequential forward feature
selection method (Whitney, 1971). This procedure iteratively composes a feature subset by
analyzing the contribution of each feature to the performance of a given machine learning
model. The procedure starts with an empty set of features. At each iteration, every feature not
yet included in the selected subset is, one by one, tentatively added to the subset in order to
train and evaluate the model on such extended subset of features. A feature that yields the
highest increase in the prediction performance is then permanently added to the selected
subset, and the next iteration follows. This process is continued up to the point at which the
chosen maximal size of the feature set is reached (set to 20 in the experiment). The training
and evaluation of the machine learning model in each trial is performed by cross-validation.
The whole process is computationally intensive as in each iteration, it requires to train and
evaluate as many models as there are candidate (not yet selected) features.
3.3 Evaluation Metrics

8
As an evaluation metric to compare the models, we used the prediction accuracy, defined as
follows:
where N is the total number of samples and TPi is the number of correctly classified objects
(“true positives”) from class i. Moreover, we are also reporting two additional evaluation
metrics, which are known to be more robust against class imbalance (i.e., when some of the
classes are much less represented in the data set than the others): the balanced prediction
accuracy (BAcc) and the F1 measure Kelleher et al. (2015)as follows:
where K is the total number of classes, while FPi and FNi are, respectively, the number of
objects incorrectly classified to class I (“false positive”) and the number of incorrectly
classified objects from class i (“false negatives”). The balanced prediction accuracy is thus
the average recall obtained on each class, while the F1 score is the average harmonic mean of
recall and precision for each class. Last, the Matthews correlation coefficient is reported due
to better performance on imbalanced problems as opposed to other balanced metrics:
9
4. Implementation
Previous research calculations, observations, stimulations and detections shows that there is a
scope of improvement in areas such as sensitivity, accuracy and processing time. In this
paper, preliminary steps are taken regarding image pre-processing. 4.1 Dataset
The proposed system requires the input of periodic space images of the night sky of a fixed
area with a certain time lapse. For the parallax method, two datasets are required from the
same position with a 6 months gap. Or datasets collected at the same time from a 180-degree
revolution position. Include a separation of 2 astronomical units (AU) if two observations of
the same asteroid on different ends of the Earth's orbit. At last, for determining whether the
detected asteroid is known or not compare it with the catalogue containing necessary
parameters.
Fig. 1. Data splitting Ratio 4.2 Data Pre-processing
If the input is night sky images, then a Python library- Pillow, is used to remove background
noise along with making adjustments to the scale by using registration and also for checking
10
if the sequential images are of the same area. Now, these pre-processed images are suitable
for the next process of moving object detection.
The proposed system requires a dataset, which is periodic space photographs of the night sky
of a given area with a specific time-lapse. Two datasets from the same position with a 6-
month gap are required for the parallax approach. Alternatively, datasets are taken
simultaneously from a 180-degree revolution position. Include a two-astronomical-unit
spacing if two sightings of the same asteroid are made at opposite ends of the Earth's orbit.
Using the Python library Pillow, carefully proceed with image processing to remove
background noise and modify the scale of each image by using image registration and also
for checking if the sequential images are of the same area. Now these pre- processed images
will proceed for detection.
Input can be given as a sequence of night sky images for the proposed system, if the input is a
sequence of night sky photographs, a Python module called Pillow is used to eliminate
background noise and modify the scale of each image by using image registration. It will also
confirm that the images present in the dataset are of the same area or not. These pre-
processed photos are now ready for the next step of the moving object detection procedure.
4.3 Architecture
Fig. 2. Block Diagram of Proposed Methodology
4.4 Detection of Asteroid
After getting scaled processed images in the previous stage, the Detection part and also
classification among unvarying and moving objects will be carried out using YOLO. After
successfully detecting the asteroid, astrometric calculations are done to figure out the
parameters such as Distance from Earth, Velocity of an asteroid, Size of asteroid and Time
for collision.
11
Further Trajectory of the detected asteroid needs to be calculated so we can determine

whether the asteroid falls on the earth's orbit or not. If the asteroid doesn’t fall on the earth’s
trajectory so according to the proposed system, it poses a negligible risk to the earth. But if
the trajectory of an asteroid falls on earth orbit it would be a threat to earth so we will predict
the risk.
We have performed the experiment on google Collab using the free GPU. Imported darknet
repository from GitHub for downloading pre-define weights. [12] Following Fig. 3 shows the
flow chart of the system setup.
a. Make Custom Dataset
• Identify the type of images needed for the training of models.
• Then make a custom dataset, first, collect images from various sites and save them locally
in one folder.
• Remove irrelevant images from the folder if any.
• Make sure that each image in the dataset is of the same extension as others. b. Training
• Use the dataset made in the previous module to train your model.
12
• If your model is already trained then follow module 4 in the next step.
• If your model is not trained yet then go for training as specified in module 5
c. Backup File
• We have to create a backup folder to store the labels which have been trained
• After every 100 iterations a new file is made in the backup folder
• We can use the backup weights after the model is trained successfully.
d. Train model from custom dataset
• The model fetches the configuration file and trains the images according to given values
• It unzips the dataset and reads the text file generated in the labelling section
• It is recommended to train the model for a high number of iterations though the time
required would also increase.
4.5 Astrometric Calculations
In astrometric calculations, we will be finding the risk factors such as distance from earth,
velocity of the moving object, collision time and size of the asteroid.
Fig. 6. Flow of Astrometric Calculations
4.5.1 Distance Calculation
For the calculation of the distance factor, which is the distance of an asteroid from the earth,
we will use the Parallax method. The process of measuring the distance to an item by
examining how it appears to move against a background using two points of observation is
13
known as parallax. Parallax is the displacement or the change in the apparent position of the
object when viewed from two different points of view. Observe the same asteroid from each
location over time. Calculate the asteroid's location in relation to the field stars. Determine
the ratio in the asteroid's apparent position between the two locations. Call that angular shift
theta, the parallax angle.
Fig. 7. Parallax method
This gives us a noticeable angle β between the asteroid's 2 apparent locations. To find the
distance of the earth to that asteroid-
Since the asteroid would be very far away, assume tan𝛷 is about equal to 𝛷 that simplifies
our formula for parallax
as:
Here,
𝑑𝑒- Distance of asteroid from earth
𝛷- parallax angle
4.5.2 Velocity Calculations
To find the velocity of an object we need relative displacement travel by an object in our
input images. So, to find relative displacement we can use a coordinate system. For the first
two images, we will fix the left-bottom corner as the origin. With respect to the origin, we
can calculate the position of the object in the form of coordinates. Then using Euclidean
14
distance, we can find relative displacement between the two points. After that, by using
scaling given along with images we can find the actual distance travelled by an object in real-
time. Then to calculate the time needed for calculating velocity we will use the time gap
between periodic images given with the dataset. Now, we have both displacement and time
consumed in that displacement, so by using the velocity formula which is displacement
divided by the time we will find the actual velocity of the asteroid.
Fig. 8. Asteroid Movement
Here,
P1 - position of the asteroid at time t1
P2 - position of the asteroid at time t2 t1
- time at which image 1 was captured t2
- time at which image 2 was captured
Sx - actual distance travelled by the asteroid in time t2-t1
Sy - Assumed distance travelled by the asteroid in time t2-t1
V - Velocity of the asteroid in time t2-t1
4.5.3 Time Calculations
To estimate time taken by an asteroid to reach earth surface will be calculated as total
distance calculated using the Parallax method in the previous section divided by the velocity
of the asteroid.
15
Project Screenshots
Using Decision Tree Classifier
16
17
18
• Using SVM
• Using XG Boost
19
20
21
6. Conclusion
This study analyses feature selection strategies in dimensionality reduction at different

levels, from using all features to removing a large number of them. The greatest results
came from XGBOOST with XGBCLASSIFIER, followed by XGBOOST with MI and Random
Forest with MI. When compared to other algorithms, SVM performed badly, but when
combined with XGBCLASSIFIER for feature selection, It showed a considerable improvement
in performance.
22
7.Reference
1. https://www.cambridge.org/core/books/abs/mitigation-of-hazardous-comets-and-asteroids/role-of-
radar-in-predicting-and-preventing-asteroid-and-comet-collisions-with-
earth/76A2AEB400D2A0557818EA2448169849
2. https://www.science.org/doi/abs/10.1126/science.1068191
3. https://www.kaggle.com/datasets/basu369victor/prediction-of-asteroid-diameter
4. https://www.frontiersin.org/articles/10.3389/fspas.2021.767885/full
5. https://www.javatpoint.com/machine-learning-algorithms

Hazardous Asteroid Prediction

Uploaded by

Copyright:

Available Formats

You might also like

Hazardous Asteroid Prediction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hazardous Asteroid Prediction

Uploaded by

Copyright:

Available Formats

AN PROJECT REPORT

HAZARDOUS ASTEROID PREDICTION

submitted in partial fulfillment of the requirement

for the award of the Degree

Under the Guidance of

Department of Information Technology

Dr. Babasaheb Ambedkar Technological University,

Lonere – 402103, Tal– Mangaon, Dist- Raigad (MH) INDIA

Eknath Janardan Dake(1930331246070)

Head of Information Technology

Guide: Prof. S. V. Bharad

We are highly indebted to department of Information Technology, Dr. Babasaheb Ambedkar

Earth is impregnable when it comes to the case of non-hazardous asteroids. The

3.1 Data collection:

3.2 Data Visualization:

Fig 1. Heat map plot

Fig 2. Count plot

3.3 Features selection:

3.4 Split dataset:

3.1 Classification Methods

3.3 RANDOM FOREST

3.2 Feature Selection

To evaluate the importance of individual spectral features for classification of taxonomic

3.3 Evaluation Metrics

Fig. 1. Data splitting Ratio 4.2 Data Pre-processing

Fig. 2. Block Diagram of Proposed Methodology

4.4 Detection of Asteroid

Further Trajectory of the detected asteroid needs to be calculated so we can determine

a. Make Custom Dataset

• Identify the type of images needed for the training of models.

• Remove irrelevant images from the folder if any.

d. Train model from custom dataset

4.5 Astrometric Calculations

Fig. 6. Flow of Astrometric Calculations

4.5.1 Distance Calculation

Fig. 7. Parallax method

𝑑𝑒- Distance of asteroid from earth

4.5.2 Velocity Calculations

Fig. 8. Asteroid Movement

P1 - position of the asteroid at time t1

P2 - position of the asteroid at time t2 t1

- time at which image 1 was captured t2

- time at which image 2 was captured

Sx - actual distance travelled by the asteroid in time t2-t1

Sy - Assumed distance travelled by the asteroid in time t2-t1

V - Velocity of the asteroid in time t2-t1

4.5.3 Time Calculations

This study analyses feature selection strategies in dimensionality reduction at different

You might also like