You are on page 1of 21

Solar Photovoltaic Analysis

Case for Renewable Energy in the USA


Hemendra Pal - “ALTREICH THINKTANK” NCLE
87% of the Machine Learning Projects Fail
(High Risk Endeavor, Source - 2019 Study)

Solar Energy Factsheet AI - M/L Factsheet


• Key Renewable Energy Source. • Only 6 % of Industry’s CEO have developed
• Universal Accessibility and Free to use. trust in AI/ML Operations.
• Intermittent nature and potential in Distributed • Forecasting modeling remains an important
Systems. area of research in Solar AI / ML.
• Solar Technology converts sunlight into electrical • Solar PV Power Forecasting based on power
energy through PV Panels or Mirrors that irradiance using weather data is important.
concentrate Solar Rays. • Importance of using AI/ML for predicting
• Potential for distributed Solar PV systems in photovoltaic power without irradiation data for
residential / commercial / government / remote grid integration and control.
millitary installations / space technology etc. • Potential to improve AI/ML forecasting for the
• Solar Energy to hit Global Tera Watt Target in 2022 rooftop / horizontal solar installations.
Strategy Frameworks for the Solar Industry in the USA
SWOT Analysis Porter’s Diamond
• Firm Strategy, Structure and Rivalry:
• Strengths
• The USA is a highly entrepreneurial country with more than 14000 solar
I. Driven by strong fundamentals and supportive policy
companies, solar industry is flexing competition with new business
towards Solar Industry. 85% decline in PV cost in 10 years.
models and new configurations.
II. Potential to use battery storage, easy accessible and free
• The Solar Sector is going through mergers and acquisitions.
to use and more success in better USA Infrastructure.
• Pairing Solar with Energy Storage offers cost synergies, operational
Weaknesses efficiencies, solar tax credits on storage capital costs.
I. Difficulties in Integration in a distributed energy
• Demand Conditions:
enviornment due to intermittent nature.
• After 85% Solar PV Cost decline in a decade, Solar Industry is one of the
II. Batteries and Solar Panels generate waste.
most competitive renewable energy sector in the USA.
• Opportunities • Renewable energy including solar energy is considered next generation.
I. Cities, States and utilities continued to power towards solar
• Related and Supporting Industries:
energy. AI/ML in nascent phase in Solar Applications.
• Most large energy and utility companies have committed or in the process
II. 48 out of 55 state large US utilities committed to reduce
of commitment to the carbon emissions 2050 goal.
Carbon emissions 2050 goal.
• If all the Solar projects located with large scale battery storage become
III. Accelerated Support for floating PV and community solar.
operational by 2025, that would increase US share from 24% to 50%..
• Threats
• Factor Conditions:
I. Supply Chain Constraints, Increased shipping costs, Rising
• 22 of the 50 states and Washington D.C. have agreed to provide
prices of key commodities.
Community Solar Power. 50% of the US Population can’t install Solar,
II. Over-reliance on limited number of suppliers and
that creates a big opportunity for Floating Solar PV or FSPV.
manufacturers, China still leads the Solar Industry.
• There is growing employment and demand for skilled labor in the sector.
Solar Energy Key Trends/ Price Advantage
Solar Energy Market Leaders / Net Renewable Energy Additions
Data Loading, Data Check and Data Description
(21045 Instances ,17 Col, Solar PV Dataset Downloaded from Kaggle)

• Descriptive Statistics for Variables • Count for Missing Values - 0 /2 Categorical Variables
• Hour_Min = 10, Hour_Max = 15
• Month_Min = 1, Month_Max = 12
Distribution plots of Input and Output Variables

array([[<AxesSubplot:title={'center':'Date'}>,
<AxesSubplot:title={'center':'Time'}>,
<AxesSubplot:title={'center':'Latitude'}>,
<AxesSubplot:title={'center':'Longitude'}>],
[<AxesSubplot:title={'center':'Altitude'}>,
<AxesSubplot:title={'center':'Month'}>,
<AxesSubplot:title={'center':'Hour'}>,
<AxesSubplot:title={'center':'Humidity'}>],
[<AxesSubplot:title={'center':'AmbientTemp'}
<AxesSubplot:title={'center':'PolyPwr'}>,
<AxesSubplot:title={'center':'Wind.Speed'}>,
<AxesSubplot:title={'center':'Visibility'}>],
[<AxesSubplot:title={'center':'Pressure'}>,
<AxesSubplot:title={'center':'Cloud.Ceiling'}>,
Top Features Correlation HeatMap
Cyclicity of hour and month / Cyclicity of Sine and Cosine Functions

• Hour, Month, Date


• We do one hot
encoding of the
features with Sine and
Cosine functions.
• The Cyclic Nature of
Sine and Cosine is
shown.
• After hot one encoding
of hour and month with
the the two categorical
variables - season and
location.
• After encoding and
plotting both hour and
month are cyclic,
Encoded Feature Variables / Heat Map with Cyclic Encoded Variables
Top 10 Features (Decision Tree Regression)

• Ambient Temp, Humidity, Cos_Mon, Pressure, Cloud.Ceiling

Feature_Variable Contr. Score


Ambient Temp. 39.92 100.00
Humidity 10.61 26.58
Cos_Month 9.09 22.77
Pressure 9.02 22.60
Cloud.Ceiling 8.5 21.29
Top 10 Features (AdaBoost Regression)

• Ambient Temp, Cos_Month, Hour_Cos, Humidity,Cloud.Ceiling

Feature_Variable Contr. Score


Ambient Temp. 37.19 100.00
Cos_Month 14.24 38.29
Hour_Cos 7.30 19.63
Humidity 6.58 17.69
Cloud.Ceiling 5.92 15.92
Pressure 5.51 14.82
Location_IDMT 5.50 14.79
Location_Hill 4.92 13.23
Lattitude 4.36 11.72
Altitude 3.09 8.31
Top 10 Features (Random Forest Regression)

• Ambient Temp,Humidity,Pressure, Cos_Month,Cloud.Ceiling

Feature_Variable Contr. Score


Ambient Temp. 40.31 100.00
Humidity 10.33 25.63
Pressure 8.86 21.98
Cos_Month 8.55 21.95
Cloud.Ceiling 8.00 19.85
Wind.Speed
Hour_Cos
Sine_Month
Hour_Sine
Lattitude
M/L Algorithms Performance - RMSE, R Square and MAE

• Decision Tree Regression M/L Algorithms Performance


Decision Tree Value
RMSE 5.715
R Squared 0.3535
MAE 3.6498
Linear Regression
RMSE 4.654
R Squared 0.5712
MAE 3.554
Random Forest Regressor
RMSE 4.141
R Squared 0.6599
MAE 2.748
AdaBoost Regressor
RMSE 5.390
R Squared 0.4249
MAE 4.605
M/L Algorithms Performance - RMSE, R Square and MAE

• AdaBoost Regression Random Forest Regression

AdaBoost Regressor Random Forest Regressor


RMSE 5.390 RMSE 4.141
R Squared 0.4249 R Squared 0.6599
MAE 4.605 MAE 2.748
CRISP DM Process Methodolgy

• Data Collection on identified data points, Solar PV Data has been downloaded online from
Kaggle.
• Data Audit and Data Cleaning for missing and duplicate values. In this study there were no
missing values in the Solar PV Dataset, Location and Season were treated as categorical
variables. While one hot encoding was done on Time and Month for Cyclicity.
• Exploratory Analysis - Descriptive Statistics, HeatMap and correlation studies on the
independent feature variables and test hypothesis with the input and output variables.
• Select the best features based on the best feature selection methods - We have Ambient
Temperature, Humidity, Pressure, Cos_Month, Cloud.Ceiling as the most important feature
variables and R_Square Value for Random Forest is 0.6599. R_Square for Linear
Regression is 0.5712.
• Testing and Training Data - We divide our data into 80% training data and 20% test data.
• We have used Linear Regression, AdaBoost, Decision Trees and Random Forest
Regression for our modeling exercise and choose Random Forest as the best model for our
M/L Exercise.
System Architecture and M/L Validation Methods

• Holdout Set Validation Method.


• Cross Validation Set for Models.
• Random Subsampling Validation Model.
• Bootstrapping ML Validation Methods, Teach and Test Method, Running AI Model Simulations.
Use Case for M/L Solar PV System

- Edge - Users can be - Cloud based


Output Prediction sent a message when - Offline Prediction
Connectivity
for Solar Energy to switch to solar based on daily data, 24
Low Latency hours ahead
Integration in - Offline Learning
High Throughput - Model Retraining
Community Solar -Batch Predictions
daily
Recommendations and Limitations - Validation of the Study

• The Feature Selection for the top 5 features is matching with 2 studies conducted in
the past, both the references have been provided.
• The R_Squared Value for the Random Forest is in alignment with a study listed in
references.
• The Data used is not based on Irradiance. Irradiance is an important parameter for
estimating solar power output.
• Many of the Python Classification Tree Algorithms can require more computational
capabilities, compared to a personal Laptop.
• This kind of analysis used a data from 12 geographical locations around USA in a
controlled environment, The Irradiance data can be very time consuming to collect,
and can have significant uncertainity.
• The Study did not test any Deep Learning Techniques or Neural Networks, we can
further do that in the next exercise on this data set.
Recommeded Datasets for Future M/L Studies in Solar / Energy Sector

1 An open-source tool to simulate the performance of photovoltaic (solar) energy systems.


https://github.com/pvlib/pvlib-python
2 Battery optimization modelling is a huge challenge as energy storage resources proliferate to
maximize renewable energy generation. Open Source https://github.com/RyanCMann/OSESMO
3 Provides access to wind data sources for generation forecasting and other applications
https://github.com/cigroup-ol/windml
4 PYISO - Open Source Library by a Non Profit Organization WattTime, which collects real time
ISO data and enables energy consumers to use energy in times when percent of renewable energy is
the highest on the grid. https://github.com/WattTime/pyiso
5 Catalyst Cooperative - https://github.com/catalyst-cooperative Public Utility Data Liberation
Project.
6 The Power Genome Project https://github.com/gschivley/PowerGenome
References
• https://towardsdatascience.com/predicting-solar-power-output-using-
machine-learning-techniques-56e7959acb1f

• Pasion, C.; Wagner, T.; Koschnick, C.; Schuldt, S.; Williams, J. & Hallinan, K.
Machine Learning Modeling of Horizontal Photovoltaics Using Weather and
Location Data. Energies 2020, 13, 2570; doi:10.3390/en13102570.

• https://scikit-learn.org/stable/user_guide.html
Python Libraries, Scikit Learn.

You might also like