Publication#3

Coastal Engineering Journal
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tcej20
An artificial neural network based system for wave

height prediction
Elad Dakar, José Manuel Fernández Jaramillo, Isaac Gertman, Roberto

Mayerle & Ron Goldman
To cite this article: Elad Dakar, José Manuel Fernández Jaramillo, Isaac Gertman, Roberto
Mayerle & Ron Goldman (2023): An artificial neural network based system for wave height
prediction, Coastal Engineering Journal, DOI: 10.1080/21664250.2023.2190002
To link to this article: https://doi.org/10.1080/21664250.2023.2190002
Published online: 21 Mar 2023.
Submit your article to this journal
Article views: 96
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tcej20
COASTAL ENGINEERING JOURNAL
https://doi.org/10.1080/21664250.2023.2190002
An artificial neural network based system for wave height prediction

a
Elad Dakar , José Manuel Fernández Jaramillob, Isaac Gertmana, Roberto Mayerleb and Ron Goldmana
a
Israel Oceanographic & Limnological Research, National Institute of Oceanography, Haifa, Israel; bForschungs- und Technologiezentrum
Westküste (FTZ), Christian-Albrechts-Universität zu Kiel, Kiel, Germany
ABSTRACT ARTICLE HISTORY

We present a system for predicting the hourly significant wave height at a specific wave Received 20 July 2022
measurement station in the middle of Israel’s Mediterranean coast (Hadera). Our system uses Accepted 8 March 2023
an artificial neural network (ANN) composed of two sub-networks. We evaluate the importance KEYWORDS
of different inputs to the system. The input includes wind forecast data from the SKIRON Wave forecast; Artificial
atmospheric modeling system, wave forecast for the station’s location given by the SWAN neural network; Artificial
wave model, and observed wave data. Our system pre-processes the wind data using a spatial intelligence
filtering scheme and then enters it into the first sub-network in the form of a multidimensional
tensor. We take special care to interconnect the tensor elements through a dimensional
permutation that leads the ANN to sum elements along all the tensor’s dimensions. Our system
groups the output of the first sub-network with the rest of the input and feeds it to the second
sub-network that gives the prediction. Our ANN system outperforms the SWAN wave model in
estimating wave heights over 1.5 meters. We obtain the best performance when either all input
components are used or just wind and observations. Reimplementation of the system at
Ashkelon yields smaller improvements due to insufficient training data.
1. Introduction
(see p. 4) in a complementary calculation. Sadeghifar
Improving the accuracy of wave characteristic predic et al. (2022) also use observations from buoys stations
tions is beneficial for the safety of maritime traffic, near but implicitly include the effects of wind in processing
shore constructions, and offshore platforms. Correctly wave data during both the training and testing stages.
forecasting high wave conditions is particularly impor Their article presents various soft computing methods in
tant as they pose a risk to small boats (e.g. patrol boats, addition to ANN. Tsuda, Kim, and Matsumi (2016) used
fishing boats) and may hinder air-sea rescue operations. atmospheric observations in addition to observations
The current standard method to make such predictions from buoys stations, and Kim, Pan, and Mase (2019)
is to use numerical models. A numerical model uses the used observations and derived parameters of typhoon
initial state of the sea and the wind forcing as input data. systems. Gómez-Orellana et al. (2022) used similar types
Current numerical models explicitly represent the phy of input data but differentiated their work by using
sics relevant to the development of the sea state, includ weighted means of the four reanalysis nodes located
ing nonlinear wave interactions and two-way coupling around each buoy station. Using these means enables
between wind and waves. them to gain atmospheric data that represents the con
In recent years, using ANN to provide wave predic ditions at each buoy. The training algorithm their article
tions has shown it to be a valuable tool. ANN is presents is the Multi-Task Evolutionary Artificial Neural
a computing system based on a collection of inter- Network (MTEANN), which is substantially different from
connected computational units called artificial neurons. the backpropagation and gradient descent algorithm
By optimizing the neurons’ parameters, it is possible to used in our work. Yang et al. (2021) also use similar
make the network of neurons emulate complex rela types of input data but employ the time series decom
tions between input and output. Most ANN wave- position method of STL and the input data pre-
prediction systems receive input of time series from processing method of positional encoding. Because
observational systems. For example, López, López, and their input data have the form of a 2D tensor, their
Iglesias (2015), Castro et al. (2014), Tsai, Lin, and Shen system uses a convolutional neural network (CNN).
(2002), Fan, Xiao, and Dong (2020), and Miky et al. (2021) Similarly, Guan (2020) also uses CNN but also LSTM
use observations from buoys stations to predict wave neural network. Makarynskyy et al. (2005) have sug
parameters. Of the five articles, Fan et al. are unique in gested using a wave model as input along with
using long short-term memory (LSTM) neural network. observed data; however, the primary approach is to
They also use input from a single grid point of SWAN use ANN as a replacement for the high computational
CONTACT Elad Dakar edakar2k@gmail.com

This article has been republished with minor changes. These changes do not impact the academic content of the article.
© 2023 Japan Society of Civil Engineers
2 E. DAKAR ET AL.
costs of the numerical spectral wave model. grid that is nested within it, covers the eastern
Shamshirband et al. (2020) have extracted meteorologi Mediterranean east of longitude 22°E at a spatial resolu
cal time series from a reanalysis of meteorological data tion of 0.125 degrees. The directional spectral grid in
as input to their system. both regions is the same: 25 logarithmically distributed
In contrast to these works, the system developed by frequencies between 0.063 Hz and 1 Hz and 24 direc
Tom et al. (2019) uses gridded meteorological time tions. The main issue within the system has been the
series as inputs instead of a single or several stations’ underestimation of the wind forcing. At the time of its
time series. This meteorological grid superimposes the setup, Gertman et al. (2005) performed a simple calibra
Sea of Japan. To process this data type, the paper uses tion procedure, and found that the system works best
a CNN, which takes advantage of the spatial relation to with a factor of 1.29 to the 10 m wind field.
reduce the number of sought-out parameters. Nevertheless, using a single coefficient to correct the
In this paper, we present an implementation of an wind resulted in gross overestimations in stormy events.
artificial neural network (ANN) that aims to improve As in some of the works above, we try to estimate
the wave height forecasts given by a numerical model. wave characteristics in one specific location nearshore
Our system takes input from observed wave conditions by using observed wave data as input data. Our system
at an offshore station, the numerical forecast of uses the wave height at the IOLR Hadera Meteomarine
a spectral wave model at a single grid cell near that Monitoring Station (Global Sea Level Observing System
station, and the gridded forecast data from the atmo Station #80), whose location is near the center of the
spheric model in a region that covers the station. Our Israeli Mediterranean coastline, 2 km off the shore, at the
approach is rather innovative in combining different end of a coal terminal. The station contains an RDI work
inputs from different sources and data structures. This horse ADCP instrument that provides hourly wave sta
is done by spatially pre-processing the multivariate tistics where the water depth is about 26 m (https://
wind atmospheric data and inserting the result into isramar.ocean.org.il/isramar2009/Hadera/). The wave
a point-wise convolutional ANN. The output of this statistics at this station provide a good representation
ANN enters into a fully connected ANN along with of the wave field along the rest of the Israeli coastline. In
the rest of the input. Our motivation is to take advan section 2.2, we re-implement the system at a second
tage of each input’s strengths to compensate for the monitoring station in Ashkelon. Since the station at
others’ shortcomings. The atmospheric forecast input Ashkelon has been active for shorter duration than at
we use provides information on the local synoptic Hadera, we only use the Ashkelon station data to exam
system but only covers a part of the fetch that influ ine the robustness of the system structure.
ences the output wave height. The numerical wave The Israeli coastline location is in the southeastern
model does simulate the whole wave response to corner of the Mediterranean. Cyclones passing over the
storm events, but it does not establish itself in actual Mediterranean Sea during the winter are the leading
observations and tends to overestimate the wave cause of high waves. The wave heights during the
height. The observational data indicate the current storm depend on the orientation and strength of the
state of the waves but do not carry information on cyclone as it makes landfall. In particular, the strong
the future since the station is located downwind in southerlies to north westerlies are part of the cold front
most synoptic conditions. of the cyclone. Moreover, cyclones passing over the east
ern Mediterranean generate swell waves that add to the
wave height. The trajectories of cyclones can vary signifi
2. Material and methods cantly (Karas and Zangvil 1999), which leads to variations
in the intensity of the storms. Our observations show that
2.1. System and datasets
storms with significant wave heights greater than 5 m
The SWAN (Simulating WAves Nearshore) numerical occur yearly on average. Other synoptic systems (see
wave model generates numerical wave forecasts that (Alpert et al. 2004) for their characterization), such as
are used in this paper. SWAN is a third-generation the Red Sea Trough (RST) and locally formed thermal
wave model, developed at Delft University of cyclones, may also produce significant wave heights
Technology, that computes the wind-generated wave above 1.5 m due to locally intensified winds.
field in coastal regions and inland waters (Holthuijsen In this work, we use data from the winter seasons
2007). In 2009, Israel Oceanographic and Limnological (November to February) in which high wave conditions
Research (IOLR) set up a wave forecasting system. This are common. This is motivated by our desire to provide
system uses a 10 m wind forecast from the SKIRON predictions at times when the wave model error is
forecasting system (Kallos et al. 1997). The SWAN most prominent, and the importance of the prediction
wave model exists on the same grids that an older to decision makers is highest. Moreover, the synoptic
system used (Gertman et al. 2000): variability in the winter is higher than in the summer,
A coarse grid covers the entire Mediterranean at so the training set contains a more extensive variety of
a spatial resolution of 0.5 degrees, and a finer resolution synoptic conditions.
COASTAL ENGINEERING JOURNAL 3
ANN is a computational model inspired by biologi the usage of this type of neural network is mostly for
cal neural networks (Hassoun 1995). The model com dealing with the vanishing gradient problem
prises a network of mathematical operators, with each (Hochreiter 1998). This problem exists for deep neural
operator functioning as a single neuron. The network networks, i.e., networks with many hidden layers. We
arrangement is a chain of neuron layers in which the believe that the network we use is not deep enough to
input for a single layer comes from the output of the justify the use of LSTM – we use 11 layers between our
previous layer in the chain. The first and last layers of input and output, and part of our input only goes
the chain are the input and output layers, respectively. through 3 layers (these numbers include the input
The structure of the layers is usually predetermined, and output layers). The absence of the vanishing gra
and a process of optimization known as training deter dient problem also led us to avoid batch normalization.
mines the parametrization of individual neurons. The Figure 1, (p. 7) presents the temporal scope of the
training data set contain batches of samples of both inputs and outputs of our ANN system.
the input data and the desired outcome. During the Our system takes input from the SWAN forecast for
training of a neural network, a single predefined func the unmasked grid cell (of the east Mediterranean grid)
tion computes the cost for each batch of samples. The closest to the Hadera station (34.75°E, 32.375°N; see
cost function represents the difference between the Figure 2, p. 9); the time series is taken at the 24
processed and the desired output of the network. The h coinciding with the ANN prediction period. Due to
training process searches for the optimal parametriza operational considerations at IOLR, historically, the
tion of the neurons that will minimize the cost. In this starting points of the SWAN forecasts have been 12 h
way, the trained system carries the meaning of statis after those of the SKIRON forecasts. Because of this, our
tical best fit to the training dataset. system takes the SWAN time series from two consecu
Among the different architectures, we have tested, tive forecasts (see Figure 1, p. 7).
the system we denote as ANN0 obtains the best results In order to use the wind data, the system takes
to predict the significant wave heights at Hadera sta the data on a 65 × 65 spatial sub-grid of the
tion. In this section, we will describe the design of this SKIRON 0.05° resolution grid, whose center is at
system that is the basis for the prediction process. the Hadera measurement station. The sub-grid cov
Section 2.2 will discuss alternative system configura ers a square with a side length of 3.25° (approxi
tions to show the effect of changing the amount and mately 325 km). Our reasoning for this selection is
type of inputs applied to the prediction system. to provide the ANN with sufficient spatial informa
We created the ANN using the TensorFlow free and tion needed for deducing the synoptic system
open-source software library in Python programming orientation regarding Hadera. We would specifi
language (Abadi et al. 2015). We trained the ANN at the cally like to have an indication of the passing of
system’s core to minimize the errors of its output in warm fronts during storm events and RST orienta
comparison to the observed data of Hadera station at tion (see (Alpert et al. 2004) for a description of the
the predicted times. synoptic system in Israel).
In deciding what type of ANN to use in our system, For each grid cell, our system constructs
we opted for a feedforward neural network. We a concatenation of four time series taken from recent
decided not to use an LSTM neural network because SKIRON forecasts (see Figure 1, p. 7): the first 24 h of
Figure 1. Temporal scope of the inputs and outputs of the ANN system. Forecasts and observations from the three days d-2 to
d are presented. Each row represents the temporal coverage of the forecast/observation. A wavy pattern indicates the data that
functions as input for the ANN forecast of day “d;” diagonal lines pattern indicates the data that the same ANN forecast is
compared to during training, validating, and testing sessions.
4 E. DAKAR ET AL.
Figure 2. Schematic description of the 65 × 65 wind-data grid around Hadera and the decomposition and coarsening processes.
The red arrows are the wind vectors. The black arrows are Hadera-oriented components. The red dots mark locations of non-
masked cells of the SWAN east Mediterranean grid that are closest to their respective station (Hadera/Ashkelon). Our system takes
the SWAN forecasts from these cells.
2 3
the latest 3 days (d-2, d-1, and d) and the forecast of the � � 1 1
1 2 1 14
(d-1) day for the 24 h coinciding with the ANN predic 1
8 (8 2 2 5).
1 2 1
tion period. The resulting time dimension has a length 1 1
of 96 h. The reasoning for taking overlapping time In the Hadera cell, the values of the wind velocity
series is to allow the ANN to use different versions of magnitude and eastwards component remain the
the forecast for the same period. same in the new smaller grid.
Our system applies a sequential coarsening process to Our system uses the coarsening process twice, redu
the wind velocity. We use this process to achieve a spatial cing the spatial grid size from 65 × 65 to 33 × 33 and then
smoothing of the wind data, thus reducing the noise in to 17 × 17. The two wind components are arranged along
the data (Plokhenko and Menzel 2001). An additional a separate dimension, resulting in a 96 × 17 × 17 × 2 ten
benefit is reducing the number of grid cells we insert sor for each daily pre-processes SKIRON forecast.
into the ANN and, hence, the computational burden. Post coarsening, the total number of inputs for the
Before the coarsening process, we decompose the system is 55,560, so the number of parameters in
velocity vector into two components. One component a simple 3-layer feedforward ANN will be over 3‧109.
points to the center of the grid (i.e. to the “Hadera” cell), Instead, the ANN network comprises two subnet
and the second is orthogonal to it (see Figure 2, p. 9). works (see Figure 3, p. 11). The first subnetwork
Averaging these components of the wind data provides (denoted wind-ANN) is for reducing the dimensional
two independent vector components. We chose this ity and size of the wind input before its integration
decomposition, assuming that each component has its with the rest of the inputs. The Wind-ANN replaces
own particular contribution to the waves’ properties at two fully connected layers with a sequence of point
the prediction point. For the Hadera grid cell, our sys wise convolutions, iterating over all dimensions of the
tem takes into account the magnitude of the wind tensor twice. As a result, the number of parameters in
velocity and the eastward component of the wind. the wind-ANN is only 30,117. The second ANN
Our system does the coarsening by replacing 2 × 2 (denoted forecast-ANN) takes as input the concatena
grid cells off the Hadera Meridian or Hadera line of tion of the output of the wind-ANN, the deterministic
latitude with a single cell containing the averages of SWAN forecast from a single cell of the SWAN east
the components in the four grid cells. On the Hadera Mediterranean grid, and observed wave data from the
Meridian (the Hadera line of latitude), our system Hadera station. The second ANN processes this input
replaces the 2 × 1 (1 × 2) grid cells with a weighted to produce the final prediction via a simple fully con
average of the 2 × 3 (3 × 2) grid cells with weights nected 3-layers ANN.
Figure 3. A schematic chart of the ANN structure and data flow. Each block denotes a 3D tensor of the wind dataset; in reality, the
wind dataset is a 4D tensor. Each line inside a face of a block represents a vector within the tensor. Each layer sums along a specific
dimension of the tensor. The forecast-ANN input is the concatenation of the Wind-ANN output, Observation data time series, and
SWAN data time series.
The activation function of each neuron in both sub Wl, ; Bl are the weights and biases of the first layer,
networks is given by (Karlik and Vehbi 2010) respectively. The resulting output tensor has the
same dimension as the input tensor except for the
fa ðWS þ BÞ ¼ LReLUðWS þ B; α ¼ 0:1Þ; (1)
dimension on which the layer operates. The number
where WS is the weighted sum, B is the bias and of elements along that dimension is the number of
� neurons in the layer.
x for x > 0 Since each layer operates along its dimension, it
LReLUðx; αÞ; : (2)
αx else
does not create an interaction between elements
For training purposes, we arrange all the 4-dimen along the other dimensions. To create interactions
sional wind datasets along a fifth dimension. The rest between all the tensor elements, the layers operate
of this section will ignore this dimension, as it is non- along all the different dimensions sequentially twice
essential for the system description. (see Figure 3, p. 11). For example, the output of the 2nd
Each neuron layer in the wind-ANN is a fully con layer is given by
nected one (Ramsundar and Zadeh 2018) operating !
ð2Þ
X ð1Þ
along a specific dimension in the process of point- Tijkl ¼ fa Wk~k � Tij~kl þ Bk (4)
wise convolution: For every neuron, a weighted sum ~k
is computed (each neuron with its weights) with the
input-tensor elements along this dimension. For exam Note that k is now the neuron index of the second
ple, the output tensor of the 1st layer is given by layer in the 2nd layer (going from 1 to 25). In this
! manner, the ANN performs a point-wise convolution
ð1Þ
X ð0Þ along every dimension of the input tensor, two times
Tijkl ¼ fa Wl, � Tijk, þ Bl (3) along each dimension.
,
The first four layers increase the sizes of the tensor
Where i is the time index, j is the reduced-latitude by factors of (1.5, 1.5, 1.5, 2.25) along each dimension.
index, k is the reduced-longitude index, l is the neuron Our system post-processes the resulting tensor of the
index of the first layer, ,is a bound index along the wind-ANN and takes a 1D tensor with a length of 32
same dimension as the “velocity component,” and from it. The numbers of neurons in the eight layers of
6 E. DAKAR ET AL.
Table 1. Size of the tensor dimensions through the layers of Wind-ANN. Light gray indicates the input dimension of the next layer.
Dark gray indicates the output dimension from the layer.
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8
Dimension size Input output output output output output output output output
“Time” 96 96 96 96 216 216 216 216 32
“Latitude” 17 17 17 25 25 25 25 25 25
“Longitude” 17 17 25 25 25 25 25 25 25
“Component” 2 3 3 3 3 3 3 3 3
the wind-ANN are 3, 25, 25, 216, 3, 25, 25, and 32. of 2017–2018. This approach could be suboptimal
Table 1 shows the shape and size of the tensor as it in minimizing the RMSE of the prediction errors in
passes through the different layers. the testing period. However, every iteration with
The input to the forecast-ANN is the concatena new Wl parameters results in a well-trained ANN,
tion of the result of the post-procession of the so the under/over fitting problem of the ANN
wind-ANN output with Hadera observed wave data parameters is solved regardless of how close the
and with the SWAN model forecast near Hadera. system is to the RMSE minimum. The ADAM
The forecast-ANN comprises three fully connected Algorithm (Kingma and Ba n.d.) executed the
layers. The number of neurons in the first training of the ANN. The ADAM Algorithm is
and second layers equals the size of the input vec a type of back propagation and gradient descent
tor (i.e. 104). The number of neurons in the final algorithm. It is different from the “classical” back
output layer is 24. The forecast-ANN produces propagation and gradient descent algorithm by
a significant wave height (Hm0 ) prediction for 0–23 having a specific learning rate for each weight in
h, with which the system calculates the cost func the network. Each specific learning rate is also
tion value that is used for the entire ANN training. adaptive and changes during the training process.
Eq. (5) gives the cost function that the system uses, When the cost function had descended to a certain
which is a weighted sum of several norms. value during the training process, our system saved the
" # state (i.e. the weights and biases) of the ANN for the first
i¼N
W2 X W3 i¼N
X1
cost ¼ W1 � jPi Oi j þ jPiþ1 Pi j time. After that point, the system saved the state of the
N i¼1 N 1 i¼1
ANN whenever the cost function value descended to
W4 imaxXþ6
a value smaller by a certain constant from the previous
þ � jPi Oi j
13 i 6 saving value. Thus, the training process resulted in 31
max
(5) ANN candidates that ranged in cost values from 0.4 m to

0.1 m. Through the evaluation of the candidates’ perfor
Where Pi and Oi are the ANN prediction and observation mances in the validating session, we have found that
at the i-th hour, respectively. N is the number of hours the ANN corresponding to the cost value of 0.12 m (for
(i.e. 24). The hour imax is the hour of the largest misfit the seven winters 2009–2016) as the optimal one for
between prediction and observation. The Wl parameters predictions, i.e., the one that minimizes the cost function
are the weights for the terms of the loss function, with
for the winter of 2016–2017. This ANN is different from
values W1 ¼ 0:93; W2 ¼ 0:7; W3 ¼ 0:3; W4 ¼ 0:07.
the one that minimizes the cost function for the training
The second term is for producing a smoother forecast.
session, averting the possibility of overfitting. For that
The third term serves to minimize the maximum error
reason, there is no need to use the dropout method. Our
given by the system, which is a critical requirement for
system thus uses this ANN candidate for predictions. We
a prediction system.
approximately observed the same range of Hm0 values
during the training (0.01–6.62 m) and validating (0.04–
6.49 m) periods, as well as during the winter of 2017–
2.2. System training and sensitivity testing 2018 (0.02–6.71 m).
Using the 2-Fold Cross-Validation method (Burman To test the system’s sensitivity to the starting hour, we
1989), we have trained the ANN0 system with data conducted a series of 23 ANN training and validating
from the seven winters between 2009–2016. The sessions, resulting in what we denote as “time-shifted”
winter of 2016–2017 was the validating period. ANNs. In each session, both the input data and the
The winter of 2017–2018 was the testing period observed data at the predicted times are shifted in time
in which we compared the performance of differ by an integer number of hours, emulating a prediction at
ent optimized ANN systems. The metric for the a starting hour different than 0 h. For each shift, there is
optimization was the cost function. We set the thus a different optimized state of the ANN.
Wl parameters in the cost function (see eq.(3), To estimate the reliability of the ANN systems with
p. 12) by minimizing the root mean square of different types of wind filtration schemes, we compare
the prediction errors of the ANN for the winter the performance of ANN0 to that of two other ANN
systems with different types of averaging. The system Figures 5–7 (pp. 18–20) give the (predictions) scat
denoted as ANN1 applies averaging to the original ter plots for the SWAN, ANN0, and ANN1 systems,
eastward and northward wind-velocity components respectively. One can see the improved performance
provided by SKIRON. The system denoted as ANN2 of ANN0 over SWAN. Whereas SWAN can overestimate
takes only the component in the direction of the the wave height by as much as 100%, with a difference
Hadera cell. ANN2 takes the wind velocity magnitude of up to 5 m, ANN0 predictions stay within a range of 1
in the Hadera grid cell as the data for this cell. The m from the observations except for some underestima
structure of the wind-ANN in ANN1 remains the same tion (about 2 m) for events in which the wave heights
as in ANN0; however, the removal of wind components are over 5 m. The scattering of the ANN1 system is
in ANN2 reduces the number of dimensions in the greater than that of ANN0, with a more significant
wind-ANN tensor. Thus, the wind-ANN in ANN2 has underestimation of high waves (see also Table 3, p. 21).
six layers processing a 3-dimensional tensor. To compare the performance of different systems at
To test the system’s sensitivity to a reduction in the stormy events, we computed statistical values only for
number of input data types (wind, SWAN, and observa observations greater than or equal to 1.5 m; we denote
tions), we conducted six ANN training and validating these values with the subscript “ > 1:5m.” Table 3
sessions. In three of the sessions, we omitted one (p. 21) summarizes the values of the root-mean-
element of the input data (e.g. the SWAN data). In square-error (RMSE > 1:5m ), the coefficient of determina
the other three, we omitted two elements of the tion (R2 ), and the cost function for the testing period
input data. For each omittance, we changed the num (2017–2018 winter). Figure 8 (p. 22) is a Taylor diagram
ber of neurons in the forecast-ANN to fit it to the (Taylor 2001) of the different systems and displays the
structure of the remaining data. standard deviation (σ > 1:5m ) of the predictions, Pearson
To further estimate the system robustness and correlation coefficient, and the centered root-mean-
the usability of the technique in places other than square-error (CRMSE > 1:5m ), all for the observations
Hadera station, we have implemented a copy of greater or equal 1.5 m. The Taylor diagram evaluates
the ANN0 system for IOLR Ashkelon Meteomarine the degree of correspondence between observations
Monitoring Station, which we denote as ANN0- and predictions. This allows the viewer to determine
Ashkelon. This station is located in the southern the model that achieves the highest correlation,
part of the Israeli Mediterranean coastline, 2 km off Lowest CRMSE > 1:5m , and best agreement with the
the shore, at the end of a coal terminal. The station observed standard deviation. One should note that
contains an RDI workhorse ADCP instrument that RMSE > 1:5m and CRMSE > 1:5m are related by the formula
provides hourly wave statistics where the water
depth is about 26 m. Unfortunately, the station RMSE2> 1:5m ¼ CRMSE2> 1:5m þ MBE2> 1:5m (6)
has been active for fewer years than the station
at Hadera. The data that was available covered the Where MBE > 1:5m is the mean bias error for observa
3 years of 2012–2015 and the 2 two years of 2017– tions greater than or equal to 1.5 m.
2019. In order to implement ANN0-Ashkelon the The observed σ > 1:5m for the testing period (denoted
training period was the 2012–2015 winters, the by σo;v> 1:5m ) is 1.03 m. This value is similar to the values
validating period was the 2017–2018 winter, and σ > 1:5m of the training and validating periods, which are
the testing period was the 2018–2019 winter. The 0:88m and 0:98m, respectively. This indicates that the
values of Hm0 reached 6.21 m during the training wave conditions during the testing period are similar to
period, 6.37 m during the validating period, and those of the training and validating periods.
4.51 m during the testing period. Figure 4 exhibits The wave height forecasts of the SWAN system
the distribution of Hm0 for the training periods of during stormy events are worse (in terms of
Hadera and Ashkelon. As in ANN0, SWAN data was RMSE > 1:5m and CRMSE > 1:5m ) than the prediction of
obtained from the nearest grid point of the SWAN ANN0 by a factor of about three. The Taylor diagram
model (34.375°E, 31.625°N; see Figure 2, p. 9). Wind confirms that averaging the Hadera directed and
data were obtained for a 65 × 65 spatial sub-grid of orthogonal components (ANN0) gives better predic
the SKIRON 0.05° resolution grid, whose center is tions than those produced by averaging the original
at the Ashkelon measurement station as in ANN0. northward and eastward components (ANN1). The
Table 2 shows the ANNs that we have trained and ANN2 system performs worse than ANN0.
validated: Table 3 (p. 21) and Figure 8 (p. 22) reveal how
critical each input data type is to the ANN’s ability
to predict wave height. Using only the observations
3. Results and discussion
as input data give worse results than using any
In this section, we present the results of predictions by other configuration. The other configurations give
all the ANN systems discussed in section 2.2 for the results that are not drastically different from one
testing period, i.e., the winter of 2017–2018. another. The RMSE > 1:5m difference between two
8 E. DAKAR ET AL.
Table 2. Details of the authors` various ANNs trained and validated.

System name Wind data1 SWAN forecast Observations data
ANN0 ✓ ✓ ✓
ANN1 Uses the eastward and northward wind-velocity components ✓ ✓
ANN2 Uses only the directed component ✓ ✓
No-winds ANN ✗ ✓ ✓
No-SWAN ANN ✓ ✗ ✓
No-observations ANN ✓ ✓ ✗
Only winds ANN2 ✓ ✗ ✗
Only SWAN ANN ✗ ✓ ✗
Only observations ANN ✗ ✗ ✓
23 “time-shifted” versions of ANN0 ✓ ✓ ✓
ANN0-Ashkelon ✓ ✓ ✓
1 If an ANN input data does not include the wind data, there is no wind-ANN subnetwork in its structure.
2 In this system, wind-ANN has a 24-neuron output layer and no forecast-ANN subnetwork in its structure.
Figure 4. Distribution of significant wave’s heights for the training period of Hadera (2009–2016, left) and Ashkelon (2012–2015,
right).
Figure 5. Winter of 2017–2018, scatter plot of the SWAN system at Hadera.

Figure 6. Winter of 2017–2018, scatter plot of the ANN0 system.
Figure 7. Winter of 2017–2018, scatter plot of ANN1 system.
such systems is less than 25 cm, much smaller than systems can be considered advantageous compared
the variability of the wave height in storm events. In to other soft computing wave prediction systems
addition, the R2 values we obtained for these (e.g. see Table 1 in (Miky et al. 2021)).
10 E. DAKAR ET AL.
Table 3. Cost function, RMSE > 1:5m , and R2 values for the SWAN system and various ANN
systems described above.
System name Cost function [m] RMSE > 1:5m ½m� R2
1 SWAN at Hadera 0.66 1.67 0.3
2 ANN0 0.19 0.49 0.93
3 ANN1 0.23 0.58 0.89
4 ANN2 0.24 0.59 0.9
6 No-winds ANN 0.21 0.67 0.88
7 No-SWAN ANN 0.2 0.48 0.92
8 No-observations ANN 0.21 0.53 0.90
9 Only winds ANN 0.29 0.6 0.85
10 Only SWAN ANN 0.24 0.72 0.86
11 Only observations ANN 0.34 1.09 0.7
12 SWAN at Ashkelon 0.5 1.3 0.2
13 ANN0-Ashkelon 0.27 0.62 0.77
Although the difference is not significant, one

can see that some systems perform slightly better
than others in terms of RMSE > 1:5m and CRMSE > 1:5m
values. Using only the wind data as an input gives
better results than using both the observations
and the SWAN as input data. Within the systems
that take two types of input data, the system that
omits the wind data performs the worst, whereas
the system that omits SWAN performs as well as
ANN0.
To estimate the sensitivity of the ANN to the
prediction period starting hour, we compared
a series of 23 ANN pairs. The first ANN in all the
pairs was ANN0, receiving shifted input data and
predicting for a shifted period. The second ANN in
each pair had the same structure as ANN0 but was
optimized (by training and validating processes)
for a specific time shift in both the input and
desired output datasets. Figure 9 presents the
Figure 8. Taylor diagram of the SWAN system and various ANN RMSE > 1:5m results for each time shift (winter
systems described above. 2017–2018).
Figure 9. RMSE > 1:5m as a function of the shift forward from midnight for winter 2017–2018 for shift-optimized and non-optimized ANNs.
Figure 10. Taylor diagrams of the time-shifted ANNs series. Left: optimized; Right: non-optimized.
Figure 11. Hourly significant wave height at Hadera station - winter 2017–2018 versus ANN0 and SWAN predictions.
Figure 12. Hourly significant wave height at Hadera station - winter 2017–2018 versus ANN1 and SWAN predictions.
12 E. DAKAR ET AL.
Figure 13. Zoom in on the hourly significant wave height at Hadera station - winter 2017–2018 versus ANN0 and SWAN
predictions.
Figure 10 (p. 24) presents the Taylor diagrams of the and exhibit worse correlation coefficient values.
optimized and non-optimized time-shifted ANN. Examination of the cost function values suggests that
When looking at Figure 9 (p. 24) and Figure 10, one the scattering of the results is within the variability of
can see that applying the optimization process to each the optimization process.
time shift does not give significant benefit in forecast Figures 11 and 12 (pp. 25–26) show the perfor
ing storm events when compared to using ANN0 with mances of the SWAN, ANN0, and ANN1 systems during
shifted inputs. The maximum absolute difference the stormier period in the winter of 2017–2018. Figures
between the RMSE > 1:5m of optimized and the non- 13 and 14 are their respective zoom-ins. This helps to
optimized systems is 0.18 m, which is small compared visualize the performance improvement in the wave
to σo;v
> 1:5m . Figure 10 shows that the results of the non- heights and the timing of a storm evolution.
optimized system are relatively close together, with The most apparent insight one can have from
almost the same correlation coefficient and general Figures 11 and 12 is that the ANN systems are free
overestimation of σo;v > 1:5m . The results for the optimized
from the gross overestimation events that the SWAN
systems are more scattered over the Taylor diagram system has. The ANN systems’ underestimations of
Figure 14. Zoom in on the hourly significant wave height at Hadera station - winter 2017–2018 versus ANN1 and SWAN
predictions.
Figure 15. Winter of 2018–2019, scatter plot of the SWAN system at Ashkelon.
Figure 16. Winter of 2018–2019, scatter plot of the ANN0-Ashkelon system.
events are also better than those of the SWAN system. advantage over the ANN1 concerning the timing of
Although both ANN0 and ANN1 systems vastly out the storm events: one can see this on Jan. 4th, 14th,
perform SWAN, we can see that ANN0 has some and Dec. 31st.
14 E. DAKAR ET AL.
Figure 17. Hourly significant wave height at Hadera station - winter 2018–2019 versus ANN0-Ashkelon and SWAN predictions.
Figures 15 and 16 give scatter plots for the SWAN ANN to the existing operational chain does not require
and ANN0-Ashkelon systems, respectively. One can see a high computational cost.
the improved performance of ANN0-Ashkelon over We propose a scheme in which we re-project the
SWAN. Figure 17 shows the performances of the wind components before averaging. Using this scheme
ANN0-Ashkelon system as well as the performance of on the input data causes the trained ANN to give
SWAN at Ashkelon during the stormier period in the better predictions than using the typical eastward
winter of 2018–2019. As in Figure 11, we can see that and northward components.
SWAN overestimates high wave conditions in Our results also show that the wind input data is the
Ashkelon. The overestimation can reach 250% with most valuable to the successful predictions by the ANN
a prediction of about 5 m when in reality only 2 m and the observations are the least valuable data. The
storm was observed. The ANN0-Ashkelon system wind’s more significant impact may result from the
shows improvement in the prediction so that the larger number of degrees of freedom that the wind
error of ANN0-Ashkelon remains below 2.5 m. information contributes to the system. The more neg
However, the reduction in error is not as strong as ligible contribution of the observational wave informa
the reduction brought about by ANN0. Similarly, the tion to the prediction quality of high wave events is
performance of ANN0-Ashkelon shown in Table 3 in reasonable because the rise of wave height is related
terms of RMSE > 1:5m and R2 is better than the perfor to short-term synoptic events. One can also conclude
mance of SWAN but not as good as other ANN systems that none of the system’s inputs is crucial to the
in Hadera. The main reason for the poor performance improved performance during stormy events.
in ANN0-Ashkelon is most likely the result of insuffi Although the ANN system that did not use SWAN
cient training data. Specifically, Figure 4 shows that for performed best, the much simpler ANN system that
high wave heights (above 1.5 m), the distribution of used SWAN and observations still provided a useful
events is monotonously decreasing in Hadera, whereas forecast. The fact that the ANN can perform as well
the data from Ashkelon does not show this behavior. without SWAN may also indicate that the additional
As the probability of extreme events should decrease information that the SWAN forecast carries (e.g. the
with their intensity, we infer that the number of sam contribution of swell waves from outside of the wind
ples (winter storms) in Ashkelon is too low. domain) is still reproducible from the wind informa
tion. Future examination of summertime synoptic con
ditions – namely the Persian Trough – in which swells
4. Conclusions
are more critical, may assign more significance to
In this paper, we show that it is possible to train an SWAN input. The fact that observations contribute
ANN system using available forecasts and observations less than other inputs means that the system has the
to predict the hourly significant wave height at operational benefit of being quite robust to missing
a specific wave measurement station. The prediction observational data.
given by such a system outperforms the forecast of the The results show that the ANN system is also
operational SWAN system at IOLR. Adding a trained robust to a time shift of the prediction period
starting hour, considering the negligibility of the Wave Power Prediction.” Applied Soft Computing
differences between the system optimized for time 23 (October): 194–201.
shifts and the regular system. This means there is Fan, Shuntao, Nianhao Xiao, and Sheng Dong. 2020. “A Novel
Model to Predict Significant Wave Height Based on Long
no need to train the system for each time shift to
Short-Term Memory Network.” Ocean Engineering
run it hourly. 205 (107298): 107298.
Our experience with the reimplementation of the Gertman, Isaac, Alexey Murashkovsky, Victor Levin,
system for the Ashkelon wave station suggests that George Kallos, and Dov S. Rosen. 2005. Wave 750
having a statistically representative observation data Monitoring and Wind Input as Key Issues in Operational
set is a prerequisite for training similar systems. The Wave Forecasting Systems. In European Operational
Oceanography: Present and Future, edited byH., Dahlin, N.
configuration presented in this paper has been created C., Flemming, P., Marchand, S.E., Petersson, 743–749. Brest:
in a trial and error process, and future studies and EuroGoos office.
implementations of the system may automate the con Gertman, Isaac, Dov S. Rosen, S. Kariel, and Lazar Raskin.
figuration process or explore different network config 2000. “Comparison of Two Years of Wind and Wave
urations such as LSTM. Hindcasts via WAM Based Operational Forecasting
System versus Field and Other Model Data.” In 6th
International Worksop On WaveHindcasting and
Forecasting, 91–98. Monterey: Metorological Service of
Acknowledgments
Canada.
The authors warmly acknowledge the team of the IOLR Gómez-Orellana, A. M., D. Guijo-Rubio, P. A. Gutiérrez, and
Physical Oceanography Department that maintains perma C. Hervás-Martínez. 2022. “Simultaneous Short-Term
nent wave observations at IOLR stations. Wind data used in Significant Wave Height and Energy Flux Prediction
this work was provided by the group of atmosphere model Using Zonal Multi-Task Evolutionary Artificial Neural
ing headed by Prof. Kallos of the University of Athens. This Networks.” Renewable Energy 184: 975–989.
was done in the framework of international collaboration in Guan, X. 2020. “Wave Height Prediction Based on
the Mediterranean Operational Network for the Global CNN-LSTM.” In 2020 2nd International Conference on
Ocean Observing System (MONGOOS). Machine Learning, Big Data and Business Intelligence
The authors would like to thank the reviewers for all of their (MLBDBI), 10–17. Taiyuan, China: IEEE.
constructive and insightful comments in relation to this work. Hassoun, M. H. 1995. Fundamentals of Artificial Neural
Networks. Cambridge, MA: MIT Press.
Hochreiter, Sepp. 1998. “The Vanishing Gradient Problem
During Learning Recurrent Neural Nets and Problem
Disclosure statement
Solutions.” International Journal of Uncertainty, Fuzziness
No potential conflict of interest was reported by the author(s). and Knowledge-Based Systems 6 (2): 107–116. doi:10.1142/
S0218488598000094.
Holthuijsen, Leo H. 2007. Waves in Oceanic and Coastal
Waters. Cambridge, United Kingdom: Cambridge
Funding University Press.
This research was supported by the Ministry of Innovation, Kallos, G., S. Nickovic, A. Papadopoulos, D. Jovic,
Science & Technology, Israel & The Federal Ministry of O. Kakaliagou, N. Misirlis, L. Boukas, et al. 1997. “The
Education and Research [BMBF], Germany. [MOST, project Regional Weather Forecasting System SKIRON: An
number 3-15218; BMBF, project number 03F0823A]. Overview.” In International Symposium on Regional
Weather Prediction on Parallel Computer Environments,
109–122. Athens, Greece: University of Athens Greece.
Karas, Svetlana, and Abraham Zangvil. 1999. “A Preliminary
ORCID Analysis of Disturbance Tracksover the Mediterranean
Elad Dakar http://orcid.org/0000-0002-4671-4418 Basin.” Theoretical and Applied Climatology
64 (December): 239–248.
Karlik, Bekir, and Ahmet Vehbi. 2010. “Performance Analysis
References of Various Activation Functions in Generalized MLP
Architectures of Neural Networks.” International Journal
Abadi, M., A. Agarwal, P. Barham, E. Brevdo , Z. Chen, C. Citro, of Artificial Intelligence and Expert Systems 1 (4): 111–122.
G. S. Corrado, et al. 2015. “TensorFlow: Large-Scale December.
Machine Learning on Heterogeneous Systems.” https:// Kim, Sooyoul, Shunqi Pan, and Hajime Mase. 2019. “Artificial
www.tensorflow.org/. Neural Network-Based Storm Surge Forecast Model:
Alpert, P., I. Osetinsky, B. Ziv, and H. Shafir. 2004. “Semi- Practical Application to Sakai Minato, Japan.” Applied
Objective Classification for Daily Synoptic Systems: Ocean Research 91 (October): 101871.
Application to the Eastern Mediterranean Climate Kingma, Diederik P., and Jimmy Lei Ba. n.d. “Adam: A Method
Change.” International Journal of Climatology 24 (8, June): for Stochastic Optimization.” ICLR 2015.
1001–1011. doi:10.1002/joc.1036. López, Mario, Iván López, and Gregorio Iglesias. 2015.
Burman, P. 1989. “A Comparative Study of Ordinary “Hindcasting Long Waves in a Port: An ANN Approach.”
Cross-Validation and the Repeated Learning-Testing Coastal Engineering Journal 57 (4): 1550019-1-1550019–20.
Methods.” Biometrika 76 (3): 503–514. doi:10.1093/bio Makarynskyy, O., A. A. Pires-Silva, D. Makarynska, and
met/76.3.503. C. Ventura-Soares. 2005. “Artificial Neural Networks in
Castro, A., R. Carballo, G. Iglesias, and J. R. Rabuñal. 2014. Wave Predictions at the West Coast of Portugal.”
“Performance of Artificial Neural Networks in Near Shore Computers & Geosciences 31 (4): 415–424.
16 E. DAKAR ET AL.
Miky, Yehia, Mosbeh R. Kaloop, Mohamed T. Elnabwy, Artificial Neural Networks, Extreme Learning and Support
Ahmad Baik, and Ahmed Alshouny. 2021. “A Recurrent- Vector Machines.” Engineering Applications of
Cascade-Neural Network-Nonlinear Autoregressive Computational Fluid Mechanics 14 (1): 805–817.
Networks with Exogenous Inputs (NARX) Approach for Taylor, Karl E. 2001. “Summarizing Multiple Aspects of Model
Long-Term Time-Series Prediction of Wave Height Based on Performance in a Single Diagram.” Journal of Geophysical
Wave Characteristics Measurements.” Ocean Engineering Research 106 (D7): 7183–7192. APRIL.
240 (15): 109958. November. Tom, Tracey H. A., Ai Ikemoto, Hajime Mase, Koji Kawasaki,
Plokhenko, Youri, and W. Paul Menzel. 2001. “Mathematical Masahide Takeda, and Sooyoul Kim. 2019. “Wave
Aspects in Meteorological Processing of Infrared Spectral Prediction in the Sea of Japan by Deep Learning Using
Measurements from the GOES Sounder. Part I: Constructing Meteorological Data.” Journal of Japan Society of Civil
the Measurement Estimate Using Spatial Smoothing.” Engineers 75 (2): I__145–_150. October.
Journal of Applied Meteorology 40 (March): 556–567. Tsai, Ching-Piao, Chang Lin, and Jia-N Shen. 2002. “Neural
Ramsundar, Bharath, and Reza Bosagh Zadeh. 2018. Network for Wave Forecasting Among Multi-Stations.”
TensorFlow for Deep Learning: From Linear Regression to Ocean Engineering 29 (13): 1683–1695. October.
Reinforcement Learning. CA, USA: O’Reilly Media. Tsuda, Muneo, Sooyoul Kim, and Yoshiharu Matsumi. 2016.
Sadeghifar, T., G. F.C. Lama, P. Sihag, A. Bayram, and “Wave forecasts using an artificial neural network for port
O. Kisi. 2022. “Wave Height Predictions in Complex construction works management.” PIANC-COPEDEC IX. Rio
Sea Flows Through Soft-Computing Models: Case de Janeiro, Brazil: PIANC-COPEDEC.
Study of Persian Gulf.” Ocean Engineering Yang, Shaobo, Z. Deng, X. Li, C. Zheng, L. Xi, J. Zhuang,
245 (February): 110467. Z. Zhang, Z. Zhang. 2021. “A Novel Hybrid Model Based
Shamshirband, Shahaboddin, Amir Mosavi, Timon Rabczuk, on STL Decomposition and One-Dimensional
Narjes Nabipour, and Kwok-wing Chau. 2020. “Prediction Convolutional Neural Networks with Positional Encoding
of Significant Wave Height; Comparison Between Nested for Significant Wave Height Forecast.” Renewable Energy
Grid Numerical Model, and Machine Learning Models of 173: 531–543.

Publication#3

Uploaded by

Copyright:

Available Formats

You might also like

Publication#3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Publication#3

Uploaded by

Copyright:

Available Formats

Coastal Engineering Journal

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tcej20

An artificial neural network based system for wave

Elad Dakar, José Manuel Fernández Jaramillo, Isaac Gertman, Roberto

To link to this article: https://doi.org/10.1080/21664250.2023.2190002

Published online: 21 Mar 2023.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

An artificial neural network based system for wave height prediction

ABSTRACT ARTICLE HISTORY

CONTACT Elad Dakar edakar2k@gmail.com

(5) ANN candidates that ranged in cost values from 0.4 m to

Table 2. Details of the authors` various ANNs trained and validated.

Figure 5. Winter of 2017–2018, scatter plot of the SWAN system at Hadera.

Figure 6. Winter of 2017–2018, scatter plot of the ANN0 system.

Figure 7. Winter of 2017–2018, scatter plot of ANN1 system.

Although the difference is not significant, one

Figure 16. Winter of 2018–2019, scatter plot of the ANN0-Ashkelon system.

You might also like