Product Proposal 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ISM I

PRODUCT
PROPOSAL
MARCH • 15 • 2021

VANISHA SWABHANAM
ASTROPHYSICS
INTRODUCTION AND STATEMENT OF PURPOSE

The product I wish to create is a model that basically takes in a certain point in
space that is bright and detects whether that point is a star or a galaxy. The output
will be the chance that that point could be one or the other (percentage). Since I
will use many different Python libraries and databases for this project that real
world astrophysicists use, I will gain astronomical data analysis experience. It will
create a foundation for when I get a job in the future since many astrophysicists
spend most of their time on the computer programming data functions. In doing
so, I hope to obtain a wider understanding of the data science and analysis part of
astrophysics. This project will be an application of physical data skills I learn.

This could be used as the foundation to a useful tool in the astrophysics world.
An astronomical machine learning model that distinguishes and classifies two
similar groups between datasets could be applied to many other research areas in
astronomy for astrophysicists to use.

LEARNING FOCUS FOR SPRING SEMESTER

My learning focus for this semester is to understand the linkage between


computer science, data science, and astrophysics. he astrophysics field requires a
strong foundation of computer science because astrophysicists need lots of
advanced computer programs to run their simulations. They use a wide range of
programming languages to conduct observational and theoretical studies of the
universe, as well as astronomical data analysis. Some examples are C, C++, and
Python. The power of Python consists of being an extremely versatile high-level
language, easy to program that combines both traditional programming and data
reduction and analysis tools. For this reason, I want to focus my original work along
the lines of programming this semester, as it is similar to real life astronomical
applications.
REVIEW OF SKILLS AND RESEARCH

The research component of this project allows for a broader educational


experience wherein I am able to explore the effects of applying new thought
processes through study and testing. Subsequently, it will create a foundation for
when I get a job in the future since many astrophysicists spend most of their time
on the computer programming data functions. Python is a good choice to
implement this project in because it is flexible, I already know it, and with huge
numbers of libraries, it is now the default choice for many astrophysics analysis
codes.
This project will require a developed understanding of not only the Python
language, but an overall basis of how computer science methods work. A few
machine learning phenomenons such as data cleaning and one-hot encoding are
crucial to this project's success.Along with that, it will require the knowledge of
how to use library databases such as Random Forest Classifier, SciKit Learn,
Numpy, and Pandas to retrieve information and classify the data.

METHODOLOGY

I will execute this project by inputting stellar and galactic surveys from public
databases such as NASA.gov and CERN laboratories into a Pandas dataframe, which
is used to make the dataset easier to read. Then, I will clean the dataset, the
process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate,
or incomplete data within a dataset. I will build this program using a machine
learning model called Random Forest Algorithm using astroanalysis libraries such as
SciKit Learn and NumPy in the Python programming package. 80% of the data from
the set will be used to train the AI model and the rest will be used to test it.

MATERIALS

The only tangible item I will need is my computer. In my computer, I will


download various libraries and packages that allow me to classify data for this
project including
Python 3.0 NumPy Random Forest Classifier
Anaconda Pandas AstroPy
SciKit Learn Seaborn Jupyter Notebook
CONCLUSIONS

The outcome of my project will be a block of code written inside a Jupyter


Notebook that takes photometric input from a star/galaxy survey dataset and
outputs the probability of a bright point in space either being a star or galaxy. Not
only this, but there will also be a confusion matrix, evaluating the accuracy of my
classification model. This will give a clear summarization of the performance of the
model to any people who want to use it in the future.
In this notebook, I will compile a variety of tests such as accuracy,
completeness, precision, and contamination over diverse array of star-galaxy
classifiers for the DES Y1 dataset. These tests can be ported or used as examples
for any other photometric dataset.
Star-galaxy classification remains as a non-dominant but important systematic
source of error for cosmology, and very critical for Milky Way structure
measurements and discoveries. Therefore a model like this will serve as an
effective foundation to better creation of star-galaxy classification models.
VANISHA
ASTROPHYSICS
ISM I

Under the mentorship of:

MS. PRIYA
LINGUTLA
NIELSEN CO
DATA SCIENTIST

CLICK TO VIEW: Thank you Ms. Priya for everything you


have done for me. The knowledge I
FINAL have obtained from you has been a
PRESENTATION
great help and support throughout this
DIGITAL year. My success in my final product is
PORTFOLIO due to your sincere support and
mentorship. I genuinely appreciate you
STAR/GALAXY
CLASSIFICATION guiding me through every step of the
MODEL way. Again, thank you so much.

You might also like