Professional Documents
Culture Documents
Project Proposal Advance Python - Edited
Project Proposal Advance Python - Edited
AML 2203
Team Members:
Submitted to:
Professor. Vahid Hadavi
Power Plant Analysis
Introduction
Environmental, chemical, and economic studies have demonstrated a strong evidence base for associations
between energy production and air pollution. Power generation can have adverse effects on air quality,
particularly by burning coal. A power plant can affect the environment by its construction and by its operation.
These effects, or impacts, can be either temporary or permanent. A power plant and its auxiliary components
(e.g., natural gas pipelines, water intakes and discharge, coal delivery and storage systems, new transmission
lines, and waste disposal sites) take up space on the ground and in the air, use water resources, and, in many
cases, emit pollutants into the air. The plant's footprint on the ground eliminates opportunities for others to
purchase or use the land. It can also affect the existing or future uses of adjoining and nearby land parcels. The
analysis of different power planet datasets can assist policymakers in evaluating future assumptions about
demand and what type of generation will be required to meet that demand in the future, as well as knowing
which power plants are more prevalent in a country with a continuous increase in pollution.
Data Sets
The Dataset contains 34936 observations with 8 different attributes. The attributes describe information about
the Powe plant. Each powerplant has columns related to its location, capacity, generation data, and owner.
Columns Name:
Estimated Generation_gwh_2020
Project Flow
First week:
To begin with, we will explore the power planet and different datasets this week. We will review some reach
papers before we get started with our Dataset. We will use Jupiter notebook for analysis and start with cleaning
the data after importing it into the data frame. We will also implement different methods to handle missing
values.
Libraries use:
Pandas
Pandas is an open-source library that is made mainly for working with relational or labeled data both easily
and intuitively. It provides various data structures and operations for manipulating numerical data and time
series. This library is built on top of the NumPy library. Pandas is fast, and it has high performance &
productivity for users.
Pandas generally provide two data structures for manipulating data, They are:
Series
DataFrame
NumPy
Second week:
During this week, we will be ready with the clean Dataset on which we can do the analysis and find
out some valuable insights. Our findings will be presented using different visualizations.
Libraries use:
Matplotlib
Is a plotting library for creating static, animated, and interactive visualizations in Python. Matplotlib can be
used in Python scripts, the Python and IPython shell, web application servers, and various graphical user
interface toolkits like Tkinter, awxPython, etc.
Seaborn
Seaborn is an amazing visualization library for statistical graphics plotting in Python. It provides beautiful
default styles and color palettes to make statistical plots more attractive. It is built on the top
of matplotlib library and also closely integrated to the data structures from pandas.
Seaborn aims to make visualization the central part of exploring and understanding data. It provides dataset-
oriented APIs, so that we can switch between different visual representations for same variables for better
understanding of Dataset.
Third week:
Our goal this week will be to make some useful predictions using different models we can apply to our
Dataset.
Libraries use:
Scikit-learn
Scikit-learn is an open source data analysis library, and the gold standard for Machine Learning (ML) in the
Python ecosystem. Key concepts and features include:
Algorithmic decision-making methods, including:
o Classification: identifying and categorizing data based on patterns.
o Regression: predicting or projecting data values based on the average mean of
existing and planned data.
o Clustering: automatic grouping of similar data into datasets.
Algorithms that support predictive analysis ranging from simple linear regression to neural
network pattern recognition.
Interoperability with NumPy, pandas, and matplotlib libraries.
Future Goal:
Using the GSPatial air quality dataset, our team is interested in learning more about the impact of power plants
on pollution to identify a pollutant which is released most frequently from a power plant in order for the
government to take action to reduce the level of that pollutant, helping society and the environment. These
issues are of the utmost importance globally.
References: