Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Project Proposal

AML 2203

Team Members:

Shweta Yadav C0854479


Sai Srikanth Raju C0846551

Thakshak Sunkara C0846100


Sandra Nicolas C0851356

Submitted to:
Professor. Vahid Hadavi
Power Plant Analysis

Introduction
Environmental, chemical, and economic studies have demonstrated a strong evidence base for associations
between energy production and air pollution. Power generation can have adverse effects on air quality,
particularly by burning coal. A power plant can affect the environment by its construction and by its operation.
These effects, or impacts, can be either temporary or permanent. A power plant and its auxiliary components
(e.g., natural gas pipelines, water intakes and discharge, coal delivery and storage systems, new transmission
lines, and waste disposal sites) take up space on the ground and in the air, use water resources, and, in many
cases, emit pollutants into the air. The plant's footprint on the ground eliminates opportunities for others to
purchase or use the land. It can also affect the existing or future uses of adjoining and nearby land parcels. The
analysis of different power planet datasets can assist policymakers in evaluating future assumptions about
demand and what type of generation will be required to meet that demand in the future, as well as knowing
which power plants are more prevalent in a country with a continuous increase in pollution.

Data Sets
The Dataset contains 34936 observations with 8 different attributes. The attributes describe information about
the Powe plant. Each powerplant has columns related to its location, capacity, generation data, and owner.

Columns Name:

Country Code- contains the code of country like AFG.

Country- stores the name of the country like Afghanistan.

Name of Powerplant- like Kajaki Hydroelectric Power Plant Afghanistan

Capacity in MW- Stores the capacity in megawatts.

Latitude- Latitude of the powerplant

Longitude- Longitude of the powerplant

Primary_Fuel- Like Solar or Hydro

Geolocation_source - Created by combining Latitude and Longitude

Estimated Generation_gwh_2020
Project Flow

First week:
To begin with, we will explore the power planet and different datasets this week. We will review some reach
papers before we get started with our Dataset. We will use Jupiter notebook for analysis and start with cleaning
the data after importing it into the data frame. We will also implement different methods to handle missing
values.

Libraries use:

Pandas

Pandas is an open-source library that is made mainly for working with relational or labeled data both easily
and intuitively. It provides various data structures and operations for manipulating numerical data and time
series. This library is built on top of the NumPy library. Pandas is fast, and it has high performance &
productivity for users.

Pandas generally provide two data structures for manipulating data, They are: 
 Series
 DataFrame

NumPy

NumPy is a general-purpose array-processing package. It provides a high-performance multidimensional


array object, and tools for working with these arrays. It is the fundamental package for scientific computing
with Python.

Second week:
During this week, we will be ready with the clean Dataset on which we can do the analysis and find
out some valuable insights. Our findings will be presented using different visualizations.

Libraries use:

Matplotlib

Is a plotting library for creating static, animated, and interactive visualizations in Python. Matplotlib can be
used in Python scripts, the Python and IPython shell, web application servers, and various graphical user
interface toolkits like Tkinter, awxPython, etc.

Seaborn

Seaborn is an amazing visualization library for statistical graphics plotting in Python. It provides beautiful
default styles and color palettes to make statistical plots more attractive. It is built on the top
of matplotlib library and also closely integrated to the data structures from pandas.

Seaborn aims to make visualization the central part of exploring and understanding data. It provides dataset-
oriented APIs, so that we can switch between different visual representations for same variables for better
understanding of Dataset.
Third week:
Our goal this week will be to make some useful predictions using different models we can apply to our
Dataset.

Libraries use:

Scikit-learn

Scikit-learn is an open source data analysis library, and the gold standard for Machine Learning (ML) in the
Python ecosystem. Key concepts and features include:
 Algorithmic decision-making methods, including:
o Classification: identifying and categorizing data based on patterns.
o Regression: predicting or projecting data values based on the average mean of
existing and planned data.
o Clustering: automatic grouping of similar data into datasets.
 Algorithms that support predictive analysis ranging from simple linear regression to neural
network pattern recognition.
 Interoperability with NumPy, pandas, and matplotlib libraries.

Future Goal:
Using the GSPatial air quality dataset, our team is interested in learning more about the impact of power plants
on pollution to identify a pollutant which is released most frequently from a power plant in order for the
government to take action to reduce the level of that pollutant, helping society and the environment. These
issues are of the utmost importance globally.

References:

You might also like