Professional Documents
Culture Documents
Predicting The Wind: Data Science in Wind Resource Assessment
Predicting The Wind: Data Science in Wind Resource Assessment
Se f- d e i
F ia R chec , Da a Scie i
So rce Code:
I e ded a die ce: Da a anal s s and da a scien is s ho are in eres ed in learning abo ho da a science
echniq es are applied in he ind energ domain.
A e da
Wha i i d e ce a e me ?
H mea e he i d
P edic i g l g- e m i d eed
P edic i g i d bi e e
Wha is Wind Reso rce Assessmen ?
S nd g d, b ...
H m ch e i in he ind?
A e he ne 25 ears?
Wind resource assessment to the rescue!
P edic l g- e beha i f he i d
P edic e f i d bi e
K if ill ake a !
Wind Resource Assessment =
B i di g de f he h ica d i e ci i g
U ce ai i j af e
M de da a cie ce i ea i e e i he e d
Red ce e i i a d g ba a i g!
P edic i g he Wi d: A Da a Scie ce P b e !
Ge i g i d da a
C ea i g i d da a
A a i g i d da a
B i di g a de f he i d
P edic i g he i d a d f he i d fa
I [1]: impor a da as d
da a. d(2). ad()
(The e da a a e a a ca a d I ge e a ed he h eb .)
Le ' ge a feel f he i d da a b l i g he !
O t[3]:
... , ha l k e e !
Le ' a i g e da ee e de ai !
O [4]:
O [5]:
Wind Data Analysis Takeaways
Wi d da a = g i e e ie
Wi d da a e
The e a e d ai - eci c c ea de e i d da a (e.g. f e e c e )
F om Mea emen o P edic ion: B ilding a Model
We ant to predict ho much energ a ind farm ill likel produce in 25 ears of operation.
P ble :
Onl 2 ears of met mast ind measurements to predict 25 ears of ind
With that little data e reall don't kno enough about ho the ind ill beha e!
S l i :
Get more, longer-term data from other sources, co ering as much time as possible
Ra i ale:
The more e kno about the past, the better e can predict the future.
Sec ion O e ie
H a d he e ge el g- e da a
B ild a i le del edic i d eed
U ea e ad a ced del f ciki -lea edic i d eed
I e del i h i de e g d ai k ledge
I e iga e h c ea dc a e i d eed del
Ge ing More, Long-Term Da a
P a da a ce : G ba c ae de , ea e e b d a e
C e e Sa db (b e): 4 c ae de d ( a e), 3 a ea e e a ( e)
Le ' ad c i a e de (ERA5) a d ai ai da a!
I [6]: f om a b impo Pa
impo a da a d
fo a , in _da a. ():
_da a[ a ] = d. ad_ a (Pa ('./da a/'). a ( ))
O [6]: pd di mp
ime
1999-12-31 16:00:00 4.548827 249.010856 12.370013
1999-12-31 17:00:00 3.981960 251.738388 11.915547
1999-12-31 18:00:00 2.607753 249.791620 11.245783
1999-12-31 19:00:00 1.559933 233.880179 9.782310
1999-12-31 20:00:00 1.359054 231.334928 10.454369
Airport measurements are taken and climate models are calculated far from our site (the Sandbo ).
The have measurements in greater intervals than our met mast.
How much can we trust them to tell us about the wind characteristics at the Sandbo ?
Le ' plo me ma ind peed da a again he da a from he ERA5 clima e model!
I [7]: l _da a = d.c ca ([da a[' d_58']. e am le('1W').mea (), l _da a['e a5_0'][' d']. e am le('1W').mea ()], a i =1)
l _da a.c l m = ['Mea eme ', 'LT Refe e ce']
O [7]:
Good ne : De pi e being ph icall far a a , here eem o be grea imilari ie be een he clima e model
and he me ma da a. (Thi i no al a he ca e, b i i here o make hi orial f n and ea .)
P blem: Ho do e e ploit the similarities bet een references and met mast data to get an idea of hat
our met mast data ould ha e looked like if e had measured for 25 ears?
S l i n:
# Re a le dail
da a_1D = da a. e a e('1D'). ea ()
_da a_1D = _da a['e a5_0']. e a e('1D'). ea ()
I look like e ha e a good amo n of da a (7k+ poin ) and a e pec able of 0.89. Le ' plo o line of
be !
In [9]: l _m del. l ()
O [9]:
S l i n:
import as
def ( d c , ac a ):
return . ((( d c -ac a )**2). a ())
a _ d c =
a _ c =
Le ' c e he im le h g nal lea a e m del ing RMSE!
In [11]: edic ion = (ol _model. a am [' lo e']*l _da a_1D[' d']+ol _model. a am ['off e '])
The RMSE i 11% f he ind eed. Thi i n eall a g d n mbe . If e ld b ild he jec a ming
11% fa e ind han e ld ac all ha e, e ld ha e made a e e en i e mi ake.
S :H can e im e he m del?
Le ' lea n ab Ph ic !
Topography vs. Wind
I ag e a dg e e c e ef a a e ca e.
A g e e ah f ef gh , acce e a e a d he a d dece e a e a d he
b ( e ca e bec e d ag a ).
P fh ea e e a e ce e ec ed d eed .
Ele a ion Map of A ea A o nd he Sandbo
Le ' b d g de f d ffe e d d ec !
Ra a e: We a d ha he e e a d eed be ee a a d efe e ce e
d ec ha he .
A Be e M del: Binned O h g nal Lea S a e
O r simple model j st binned in 12 direction sectors.
Q ick remark: We se SciP for b ilding o r model here instead of Bright ind, since I ant to sho o that
o can e en anal e ind data ith more common data science tools.
In [12]: # B da a b d ec
def e _da a_ _d _b (d _b ):
_da a_ _b = _da a_1D[' d'][ _da a_1D['d _b '] == d _b ]
da a_ _b = da a_1D[' d_58'][da a_1D['d _b '] == d _b ]
ret rn _da a_ _b , da a_ _b
b _ a =
for b _ , d _b in e e a e(d _b ):
b _ a [b _ ] = ' _ a e ': None, 'be a ': None, ' e': . a , ' ed c ': None
_da a_ _b , da a_ _b = e _da a_ _d _b (d _b )
if not c c e _ :
contin e
RMSE f b ed de : 0.289
Score o far
Backgro nd: A random forest generates an estimate from multiple decision trees. Each tree only gets
trained on a subset of samples and features. This means that each tree has some "specialized" knowledge
about the data and can see a particular aspect of the data better than other trees. By combining the
predictions of all trees, a random forest can provide a relatively robust prediction.
Approach:
Let's come up ith some (simple) features that the random forest can feed on.
# Ge c c e ime e
conc_inde = o ed(li ( e ( .inde ).in e ec ion(l _da a_1D.inde )))
# R lli g mea
def make_ olling(da a, indo _ id h):
olling = da a[' d']. olling( indo _ id h).mean()
olling = olling.fillna( olling.mean())
olling.name = ' d_ olling_ '.fo ma ( indo _ id h)
return d.conca ([da a, olling], a i =1)
X = make_ olling(X, 3)
X = make_ olling(X, 5)
# Da e-ba ed fea e ca e em al a e
X.loc[conc_inde ,'d'] = X.inde .da
X.loc[conc_inde ,'m'] = X.inde .mon h
X.head(3)
The RMSE is not too high and de nitel ithin the range of the other models. Let's compare the models in
more detail.
Com a ing he 3 Wind S eed Model
I [16]: from a ib impor a
. ege d()
. h ()
All 3 models follo he arge mas nicel . The fores model some imes cap res peaks be er han o her
models (J n 18, Ma 6), b also occasionall has bigger misses (Apr 12, Ma 30).
Time for a direct score comparison! Remember: The lo er the RMSE, the better the model.
Despite all the fanciness of the random forest model, it does not reach the score of our binned model. This
being said an RMSE of 0.3 is still relativel high hen measured in terms of ind speed.
An In-De h L k a he Binned M del
W e eb eb ed de , e d d e da e ec . Le ' d a . We a
de a d b de . F , e ' d a e be f a e de ed e b .
6b a e de 25 a e .F ed c e eb , e de bab e e ab e.
I ld be be e e im le m del f he e ca e , ince e kn ha i ha been ained n a l f
da a and e f m ea nabl ell.
I [19]: be a = []
I [20]: b _ e = []
b _ ed c = []
f d _b , be a in (d _b , be a ):
_da a_ _b = _da a_1D[' d'][ _da a_1D['d _b '] == d _b ]
da a_ _b = da a_1D[' d_58'][ _da a_1D['d _b '] == d _b ]
b _ ed c = de _ c (be a, _da a_ _b )
b _ ed c .a e d(b _ ed c )
b _ e = e(b _ ed c , da a_ _b )
b _ e .a e d(b _ e)
RMSE b ed - e de : 0.403
O [21]: b ed 0.289
e 0.300
e 0.315
b ed_ _ e 0.403
d e: a 64
It looks as if lling the gaps in the binned model had a er bad impact on o r RMSE, making the ne model
the orst-performing one. What is going on here?
We sho ld look at the RMSE per bin to get a better pict re of ho hich binned model is to blame for the
increase in error.
In [22]: pd.DataFrame( 'rmse': bin_rmses, 'n_samples': [stat['n_samples'] for stat in bin_stats. al es()] ,
inde =dir_bins).plot.bar(s bplots=Tr e,figsi e=(15,5));
We are getting the highest errors in the bins here e inserted the simple model.
B t, if there is no ind in those bins (= directions), the errors in those bins are not important!
What e reall need is a bin- eighted scoring metric!
I ed Sc i g Me ic: Bi -Weigh ed RMSE
Fi , e i ca c a e he eigh e gi e each bi . Thi i i a he be f a e i
each bi .
N e ca b i d c i gf ci !
I [25]: a _ c e _b ed =
f de _ a e, ed c _ d in a _ ed c . e ():
da a_1D[' d_58'][da a_1D['d _b '] == d _b ]
a _ c e _b ed[ de _ a e] = e_b ed( ed c _ d, _da a_1D[' d'], _da a_1D['d '])
O [25]: U e g ed B -We g ed
b ed_ _ e 0.4026 0.1328
b ed 0.2888 0.1347
e 0.3152 0.1416
f e 0.2996 0.1807
O bi ed de ha i i g he i e de ' i f ai i ga c e be
O i e de e f e ha he bi ed de
O f e de e f
S , af e a , e d e b b ed de ed c g- e d eed a a !
P edic ing he Long-Te m Wind S eed
No e can predict the long-term ind speed at our site. This helps us to get a good idea of ho much ind
energ e can potentiall harvest if e build a ind turbine there.
Remember: We assume the ind in the future ill behave like the ind in the past. To get to the long-term
ind speed, e take all predictions from our bin_fill_ im le model. To ma imi e accurac , e
substitute it ith actual measurements from our mast herever possible.
R a 761 - a a a a (10.3%).
S a : 1999-12-31, L : 20 a , A . 2.86 /
A fe o d of ca ion abo he long- e m ind eed ime e ie
W d eed a a a e he da . The ef e, d b e d ce e e g
h a f ac f ha 24 h e d.
A a e , e ge ea ce e g d c be h h e e e .
Ac a d e ce a e e ch ec ca ed ha h he e.
F m Wind S eed Mea emen P edic i n: Takea a
I i ea b i da i e i d eed edic i de
M e ad a ced de e f a a be e
D ai k edge ca he a ih di g he igh de
...a d i h a e i g de e f a ce!
From Mas Heigh o T rbine Heigh
We have: Long-term wind speed time series at 58 m height (height of the wind speed sensor on the
mast)
We want: Long-term wind speed time series at height of turbine
We need:
Information about turbine height
Information about how wind speed behaves with height
Shea e onen :
Le ' nd he hear e ponen b ing brigh ind' S ea .A e age() me hod.
l _ eed_a _ bine l _ eed_a _ma *(h_ bine/h_ma )**a e age_ hea .al ha
in ('On a e age, he ind i :.1% fa e a he bine heigh han a he mea ed heigh .' \
.fo ma (l _ eed_a _ bine.mean()/l _ eed_a _ma .mean()-1))
Obser ations: T rbine onl starts to prod ce po er at abo t 2 m/s ind speed, po er o tp t is stead
bet een ca. 12 and 25 m/s.
Po er C r e s. Predicted Wind Speed
No that e kno ho much po er the V112 turbine produces b ind speed, let's see ho our predicted
long-term ind speed ts into the picture.
Obser ations: Long-term ind speed is er small in comparison to hat the turbine can handle, the V112
turbine is completel o ersi ed for this site!
Calculating Po er Output
De i e he V112 being e i ed, le ' la a nd i h he ene g d c i n n mbe e ld ge if
e e e b ild hi bine. We an ge a "feel" f he e and i in he c n e f he
c mm ni a nd he Sandb , i e.
I [78]: _ e _ . e ( _ eed_a _ b e, e _c e. de , e _c e. a e , ef 0, 0)
_ e _ d.Se e ( _ e _ , de _ eed_a _ b e. de )
e_ e e _d a _ ea _ e _ . a e[0]/(365.25)
_ e _ ea _ e _ . ()/ e_ e e _d a _ ea
Mea b e e ea W : 24,925
I [79]: (' T a ab e a e
da f 1 ea : :.0f '.f a ( _ e _ ea /(3.5/60*1.2)/365.25))
(' F Te a M de S c a e e ea : :.0f '.f a ( _ e _ ea /100))
T a ab e a e da f 1 ea : 975
F Te a M de S c a e e ea : 249
We , a e a d eadf ce a . B , ad , d ea e e a ea c a .T ea : We
d ' ea a b e d a e e Sa D e e d .
C ea , e ea c a da a, b d a d b ec e e Sa db d e a e e e.
N Ca ac Fac (NCF)
E e mea eh ell a bine a ind eed di ib i n and elec ici g id en i nmen in e m
f ne ca aci fac (NCF).
Blue: Demand / Orange: Demand minus solar and ind (Sell energ hen this alue is high at slightl
cheaper prices than fossil fuel po er plants to make good pro t.)
Let's plot our diurnal pro le to see if we would produce a good amount of electricty during these pro table
hours.
Unfortunately, it looks as if we produce power right when a lot of solar power is in the grid, pushing
electricity prices down. Not every wind project is like this – sometimes wind speeds are high just as energy
demand peaks.
...b ha o ld be a good po for a ind rbine?
Al h gh Sa Dieg bei g e f he , he e a e eg d f i d bi e i Calif ia.
Refe e ce
( le ed ab e)
Analy ing Wind Data
M : .
ASOS : .
F : :// . . / ? =-GIT N - 4M
E : :// . . / / .
Power Curve
Toas er po er cons mp ion: energ secalc la or.com, ass med 1200 W for 3.5 min es
Tesla Model S 100 kWh ba er : Wikipedia
H "Va ab " W P b ?
...b a b a a b ?