Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Marine Pollution Bulletin 184 (2022) 114132

Contents lists available at ScienceDirect

Marine Pollution Bulletin


journal homepage: www.elsevier.com/locate/marpolbul

Oil spills: Detection and concentration estimation in satellite imagery, a


machine learning approach
Rubicel Trujillo-Acatitla a, José Tuxpan-Vargas a, d, *, Cesaré Ovando-Vázquez b, c, d, **
a
División de Geociencias Aplicadas, Instituto Potosino de Investigación Científica y Tecnológica A.C., Camino a la Presa de San José No. 2055, Colonia Lomas 4ta
Sección, San Luis Potosí, San Luis Potosí C.P. 78216, Mexico
b
División de Biología Molecular, Instituto Potosino de Investigación Científica y Tecnológica A.C., Camino a la Presa de San José No. 2055, Colonia Lomas 4ta Sección,
San Luis Potosí, San Luis Potosí C.P. 78216, Mexico
c
Centro Nacional de Supercómputo (CNS), Instituto Potosino de Investigación Científica y Tecnológica A.C., Camino a la Presa de San José No. 2055, Colonia Lomas 4ta
Sección, San Luis Potosí, San Luis Potosí C.P. 78216, Mexico
d
Cátedras-CONACYT, Consejo Nacional de Ciencia y Tecnología, CDMX 03940, Mexico

A R T I C L E I N F O A B S T R A C T

Keywords: The method's development to detect oil-spills, and concentration monitoring of marine environments, are
Landsat essential in emergency response. To develop a classification model, this work was based on the spectral response
Machine learning of surfaces using reflectance data, and machine learning (ML) techniques, with the objective of detecting oil in
Spectral response
Landsat imagery. Additionally, different concentration oil data were used to obtain a concentration-estimation
Oil concentration
model. In the classification, K-Nearest Neighbor (KNN) obtained the best approximations in oil detection
Oil spill
using Blue (0.453–0.520 μm), NIR (0.790–0.891 μm), SWIR1 (1.557–1.717 μm), and SWIR2 (1.960–2.162 μm)
bands for 2010 spill images. In the concentration model, the mean absolute error (MAE) was 1.41 and 3.34, for
training and validation data. When testing the concentration-estimation model in images where oil was detected,
the concentration-estimation obtained was between 40 and 60 %. This demonstrates the potential use of ML
techniques and spectral response data to detect and estimate the concentration of oil-spills.

1. Introduction spills began (Hooper, 1982), but it was not until one of the worst spills
(over 4.9 million barrels of crude oil) that has been documented to date
The oil industry has experienced spills of great magnitude occurred in April 2010 in the Deepwater Horizon platform in the Gulf of
throughout its history, causing damage and losses to environmental, Mexico that its use increased exponentially (Garcia-Pineda et al., 2017).
social, economic, and even political sectors. These spills occur at During this event, human observations from the air were used to detect
different stages of the industry process chain (exploration, production, and monitor oil trajectories (Garcia-Pineda et al., 2017, 2013; Leifer
refining, storage, and/or distribution), either in marine or terrestrial et al., 2012), which ended up in high costs due to the extent of the slick
environments (Burgherr, 2007; Daly et al., 2016; Ivshina et al., 2015; Li (~180,000 km2) and the duration of the spill (Clark et al., 2010; Leifer
et al., 2016; Yang et al., 2021). et al., 2012; Topouzelis et al., 2015).
The marine environment has the greatest impact, with an estimation Therefore, better methods were sought for spill detection and
of two million tons of oil released annually due to ship strikes, oil well monitoring, retaking the use of remote sensing tools that are low-cost,
blowouts, pipeline ruptures, and explosions at storage facilities; as a cover large areas and are more efficient, making them useful for an
result, there is a constant damage to the ecosystem, and economic losses emergence response. With which it has started a new era in the appli­
(Burgherr, 2007; Daly et al., 2016; Ivshina et al., 2015). Since the 1980s, cation of remote sensing to oil spill detection, in which various agencies,
the use of remote sensing tools in the detection and monitoring of oil institutions, and governments began contributing their spatial assets

* Correspondence to: J. Tuxpan-Vargas, División de Geociencias Aplicadas, Instituto Potosino de Investigación Científica y Tecnológica A.C., Camino a la Presa de
San José No. 2055, Colonia Lomas 4ta Sección, San Luis Potosí, San Luis Potosí C.P. 78216, Mexico.
** Correspondence to: C. Ovando-Vázquez, División de Biología Molecular, Instituto Potosino de Investigación Científica y Tecnológica A.C., Camino a la Presa de
San José No. 2055, Colonia Lomas 4ta Sección, San Luis Potosí, San Luis Potosí C.P. 78216, Mexico.
E-mail addresses: jose.tuxpan@ipicyt.edu.mx (J. Tuxpan-Vargas), cesare.ovando@ipicyt.edu.mx (C. Ovando-Vázquez).

https://doi.org/10.1016/j.marpolbul.2022.114132
Received 18 April 2022; Received in revised form 8 September 2022; Accepted 9 September 2022
Available online 26 September 2022
0025-326X/© 2022 Elsevier Ltd. All rights reserved.
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

during these types of events (Bessis, 2003; Ivshina et al., 2015; Leifer the oil concentration of marine spills using oil reflectance data with
et al., 2012). different concentrations, and thus monitor the degradation of the oil
With the use of remote sensing, several studies have been conducted from the point of origin of the spill, as well as the evolution of the slick
on the detection of spills in the marine environment using SAR sensors over time. Therefore, we developed this methodology to increase the
(Arslan, 2018; Cantorna et al., 2019; Konik and Bradtke, 2016; Svej­ existing knowledge in this field of expertise and to be useful in assisting
kovsky et al., 2016; Temitope Yekeen and Balogun, 2020; Topouzelis emergency response plans in future spills.
and Psyllos, 2012). However, these sensors can have problems in
detecting spills because the oil's roughness on the water surface re­ 2. Data and methods
distributes the reflection of sunlight, resulting in slicks appearing darker
than the surrounding water and generating backscatter patterns similar The general scheme of the method is described in Fig. 1, showing the
to algae and areas with no wind, resulting in false positives (Clark et al., main stages of the work: a) optical imaging acquisition and pre­
2010; Fingas and Brown, 2018; Garcia-Pineda et al., 2013; Leifer et al., processing, b) processing to detect oil and its concentration via ML
2012; Lu et al., 2019; Sun et al., 2015). paradigms using spectral data and, c) model outputs.
Because to these SAR detection problems, work has been done on the
spectral characterization of oil slicks and their detection with multi­ 2.1. Data
spectral sensors such as MODIS, MERIS, and Landsat (Al-Ruzouq et al.,
2020; Arslan, 2018; Balogun et al., 2020; Chowdhury et al., 2021; De 2.1.1. Spectral data for surface detection
Kerf et al., 2020; Hu et al., 2018; Lu et al., 2020, 2019; Luciani and A total of 175 spectral response data were obtained from five ma­
Laneve, 2018; Mohammadi et al., 2021; Ozigis et al., 2019). However, terials/surfaces and used for training and validation of the ML models.
due to weathering processes, the oil in the water can emulsify and/or These materials were plastic, water, soil, vegetation, and oil, each with
change its concentration, altering its spectral response, making detec­ 35 observations. Data were obtained from Spectral library Version 7
tion with passive sensors difficult (Bonn Agreement Accord de Bonn (Kokaly et al., 2017), ECOSTRESS Spectral library version 1.0 (Meerdink
(BAOAC), 2007; Ivshina et al., 2015). Therefore, field and laboratory et al., 2019), Aster spectral library version 2.0 (Baldridge et al., 2009),
research have been performed to understand the degradation process, and the Spectral Signature Library generated and provided by the Na­
emulsion formation, changes in oil concentration, and its detection by tional Commission on Space Activities (CONAE (Comision Nacional de
passive satellites (Daling et al., 2014; Hu et al., 2018; Ivshina et al., Actividades Espaciales), n.d.).
2015; Lu et al., 2020, 2019; Sun et al., 2015; Svejkovsky et al., 2016). In The data in the libraries are based on information obtained through
some of them, a relation between reflectance and oil concentration has laboratory techniques, field spectroscopy and satellite data, with the
been demonstrated, also spectral differences in the emulsions, these objective of mapping minerals, soil, liquids, organic compounds, man-
results enable the detection and monitoring over time (Clark et al., 2010; made compounds, and vegetation. The spectral range of measure­
Daling and StrØm, 1999; Garcia-Pineda et al., 2013; Lu et al., 2020, ments covers the ultraviolet, near infrared and short-wave infrared
2019). (from 200 to 2500 nm).
These variations, in the spectral response of the oil, bring about a The spectral response data from the libraries showed continuous
need for a marine spill management to cover detection and monitoring. variations in different spectral ranges, so a resampling was performed
The first step, for this, is to collect spectral characteristics and variations considering the spectral ranges of six bands, covering the visible (Blue:
of oil data, as well as optical imaging data, and then develop a reliable 0.453–0.520, Green: 0.528–0.892, Red: 0.635–0.682 μm) and infrared
computational technique based on these data (Ivshina et al., 2015; Leifer ranges (NIR: 0.790–0.891, SWIR1: 1.557–1.717, SWIR2: 1.960–2.162
et al., 2012; Mohammadiun et al., 2021; Svejkovsky et al., 2016). In this μm), from Landsat 4–5 TM, Landsat 7 ETM+ and Landsat 8 OLI
sense, models based on computation and artificial intelligence (AI) have satellites.
gained popularity in the treatment and management of large amounts of The resampled spectral data were stored in a database where each
data because they are capable of discriminating features, thus increasing column is the lexicographic representation of each band (X features) and
the accuracy of description, prediction, classification and segmentation the rows corresponded to the observations of the selected materials with
tasks in various fields of knowledge (James et al., 2013; Lary et al., 2018; their respective label (Y). These data were used in the machine learning
Mitchell, 1997; Simon, 1996; Singha et al., 2013). classification models.
In the Earth observation field, machine learning (ML) and AI tech­
niques have been applied to aspects such as water, soil, and vegetation 2.1.2. Spectral data for concentration estimation
monitoring (Bar et al., 2020; Cao et al., 2020; Ghorbanian et al., 2020; Oil spectral response data, obtained as described in Section 2.1.1,
Lin et al., 2021; Singh et al., 2021; Zhou et al., 2021). Some models have were used to estimate the concentration. Forty oil data were used, which
already been developed to detect oil spills for passive sensors (Al- were found at different concentrations and thicknesses as described in
Ruzouq et al., 2020; Cococcioni et al., 2012; De Kerf et al., 2020; Kubat the USGS Spectral Library Version 7 (Kokaly et al., 2017). For this paper,
et al., 1998; Liu et al., 2016; Mohammadi et al., 2021; Ozigis et al., the oil concentrations used were of 1, 23, 40, 60, 75, and 92 %, with
2019). However, only a few have tried to develop techniques to water accounting for the remaining percentage.
discriminate emulsion, thickness and/or concentration in satellite im­
agery (Clark et al., 2010; Lu et al., 2020, 2019; Svejkovsky et al., 2016). 2.1.3. Satellite data
Therefore, the search continues for a remote sensing method that The satellite images used were mainly Landsat 5TM since its tem­
integrates the optical properties of surfaces with computational tools, poral coverage has allowed us to have images of the spill that occurred in
that can map the location and extent of oil slicks, and that is also capable the Deepwater Horizon platform in 2010. Landsat 8 OLI images of oil-
of determining their concentration to analyze their degradation over free areas were also used to corroborate the functionality of the ML
time (Burgherr, 2007; Chen et al., 2019; De Padova et al., 2017; Fingas model. These images are freely available for download on the USGS
and Brown, 2018). Earth Explorer platform, dates, and image identification details are
For this reason, in this study, we seek the possibility of using spectral given in supplementary material Table S1.
response data of oil and other materials to train and validate different The images obtained were preprocessed as described by Chander
ML classification methods. This to find the most reliable method in the et al. (2009) to obtain the TOA reflectance values (unitless). To mask
detection and segmentation of oil, in addition to finding the bands that clouds and minimize the errors caused in ML models, cloud detection
have a greater contribution in the discrimination of oil from other ma­ was applied using the CloudMasking implementation of QGIS 3.10.6-A
terials present in the optical images. Additionally, we seek to estimate Corunna (QGIS Development Team, 2018), based on Zhu et al. (2015),

2
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 1. General workflow of the method. In step 1, the images were obtained from the USGS Earth Explorer platform, then corrected to TOA, stacked, cloud masked
and restructured. In step 2, the restructured image was input into the classification model, previously trained and validated with public spectral response data, the
model output was restructured again to obtain the classified image, and if, and only if oil in the image it was passed to the concentration estimation model, step 3,
which was previously trained and validated, finally the output of this model was restructured and thus obtaining the final result image with the presence of oil and
the concentration estimation.

Zhu and Woodcock (2012). To analyze the clustering of the classes using unsupervised methods,
The calibrated bands were stacked, generating a stack of m pixels × n principal component analysis (PCA) (Pearson, 1901) and t-Distributed
pixels × 6 bands (i.e. m × n × 6; e.g., 3 pixels × 3 pixels × 6 bands). This Stochastic Neighbor Embedding analysis (T-SNE) (van der Maaten and
band stacking was restructured to obtain a lexicographic representation Hinton, 2012) were performed to determine the behavior of the data
of band stacking, in an array of dimension m′ rows × 6 columns (e.g., 9 using linear and nonlinear transformations, these analyses were carried
rows × 6 columns), where each column corresponds to a band or feature, out using stats package version 4.0.4 (R Core Team, 2021) and Rtsne
and each row is the pixel value, i.e., each row corresponds to the spectral package version 0.15 (Krijthe, 2015). To assess the separation distance
response of the material or surface within that pixel (Fig. S1). This between clusters, silhouette analysis was performed using the cluster
database was used as the unknown data input for the ML models. package version 2.1.0 (Maechler et al., 2015). Silhouette coefficients
The images were manipulated as numpy arrays to facilitate their close to 1.0 indicate that the sample is far from neighboring clusters. A
handling; therefore, NumPy 1.20.2 (Harris et al., 2020), Pandas 1.2.3 value of 0.0 indicates that the sample is at or close to the decision
(McKinney, 2010), and Matplotlib 3.4.1 (Hunter, 2007) libraries were boundary between two neighboring clusters, and negative values indi­
used, and since they were images with geospatial aspects, the GDAL cate that those samples have been assigned to the wrong cluster
3.1.4 (GDAL/OGR contributors, 2021) library was also used, all in the (Pedregosa et al., 2011; Rousseeuw, 1987).
Anaconda 4.10.0 environment (Distribution, 2020), and Python 3.7.10 For the concentration data, a linear regression model was performed
(Van Rossum and Drake, 2009). (Everitt, 1992) to observe the relationship between the concentration
and the spectral response of the oil. This analysis was performed in
general, and for each band. This analysis was performed using stats
2.2. Exploratory and statistical analysis
package version 4.0.4 (R Core Team, 2021).

Some exploratory analyses and statistical tests were performed to


2.3. Machine learning models
observe differences and similarities in the spectral response between
surfaces. To determine the difference in the means between the mate­
Machine learning models are trained looking for a relationship be­
rials studied, an analysis of variance (ANOVA) was performed (Fisher,
tween input and output, for our classification problem, the inputs were
1992) using stats package version 4.0.4 (R Core Team, 2021). Then, if
the reflectance data, and the outputs were each class to which the
the result was statistically significant, a Tukey post hoc analysis was
spectral response belongs. This type of model fits a model directly to the
performed to determine which groups were different (Tukey, 1977).
data, where each model is adjusted to minimize the prediction error and
This was performed using the agricolae package version 1.3.3 (de
decrease the overfitting (Verrelst et al., 2012).
Mendiburu, 2020). Additionally, a correlation analysis was performed
Compared to parametric or statistical models that define an input-
using Pearson's method to determine the relationship between materials
output function, and depend on a fixed set of parameters, in machine
(Freedman et al., 2007). This correlation analysis was performed using
learning it is a nonparametric, nonlinear, and flexible (Verrelst et al.,
the stats package version 4.0.4 (R Core Team, 2021). The results of the
2012; Wolanin et al., 2019). So, the search for a relationship between
correlations were plotted on a heatmap, and the dendrogram was drawn
input and biophysical parameters is possible (Verrelst et al., 2012).
to the rows and columns using the heatmap.2 function of the gplots
version 3.1.1 library (Warnes et al., 2020).

3
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

2.3.1. Surface detection with six features 2.4. Post-processing


The first part of the processing focused on the detection of five sur­
faces (oil, plastic, vegetation, soil, and water). Therefore, a supervised The result obtained for the classification task was a vector, where
classification approach was applied and since there were more than two each observation was assigned the label of the class with the highest
classes, a multi-class approach was used, where each sample can only probability of class membership according to the spectral response
have a single label (Pedregosa et al., 2011). For this task, the one vs. the values contained in the image. Afterward, and only if there was the
rest classification strategy was used, which tried fitting one classifier per presence of the oil class in the image, a result was obtained for the
class, and for each classifier, the class was matched to all other classes. It regression task, where each observation contained the estimated con­
is a widely used approach for multi-class classification tasks, besides centration value.
having advantages for its computational efficiency (Pedregosa et al., Once the result vectors were obtained, inverse restructuring was
2011). performed to generate a square matrix, and thus has a classified image
In this study, the supervised learning algorithms Support Vector where each of the five surfaces with which the model was trained is
Machine with linear (SVM-linear) and radial (SVM RBF) kernel, K- located. The same was done for the result of the regression model,
Nearest Neighbors (KNN), Decision Tree (DT) and Random Forest (RF) obtaining the distribution and the estimated concentration gradient in
were tested using the Scikit-learn 0.24.1 library (Pedregosa et al., 2011). the oil slick, but only if the presence of oil was detected.

2.3.2. Surface detection with important features 3. Results


The second part of the processing consisted on the selection of the
most relevant variables, which are calculated with the accumulation of 3.1. Statistical approach
the decrease of impurities (Gini importance) in each tree of the Random
Forest model (Pedregosa et al., 2011). After choosing the most important 3.1.1. Spectral differences
variables, the supervised learning and One vs Rest approaches were The general spectral response distributions of the five materials
used, and the same models as in Section 2.3.1 were tested using the analyzed are shown in Fig. 2A. It is observed that water has the lowest
Scikit-learn 0.24.1 library (Pedregosa et al., 2011). reflectance values compared to plastic, which has a wide variation, with
values close to 0 and 1. Materials with a similar dispersion are oil,
2.3.3. Concentration estimation vegetation, and soil. However, the ANOVA analysis showed statistical
For the third part of this work, and after detecting oil in the image, differences (p-value = 9.15e-140) among all materials. Also, in accor­
concentration estimation was performed from the data mentioned in dance with the Tukey post hoc analysis, the similar materials are oil and
Section 2.1.2. Since this task was a continuous outcome, it was treated as soil (p-value = 6.05e-2), and soil and vegetation (p-value = 8.58e-01),
a regression problem, to which a regression model was fitted with the the rest of the materials being different from each other.
Random Forest approach using the Scikit-learn library 0.24.1 (Pedre­ The spectral response boxplots for each material broken down by
gosa et al., 2011). Note that this second part works if, and only if the band (Fig. 2B) showed that the material with the highest dispersion in all
presence of oil is detected, so its result is in the same area of the detected bands is plastic (0.0046 to 0.8884). The water had the lowest values
oil slick. (0.0009 to 0.1443), which is more evident in the infrared (NIR, SWIR1,
and SWIR2). A closer look at the blue, green, and red bands showed that
2.3.4. Training and hyperparameter search the materials have similar dispersion, compared to the infrared bands,
The data sets were divided into the training set, and the validation where a separation of the materials begins to be noticed. In the NIR
set, with 67 % and 33 %, respectively. Since there is no way to know the band, differences between soil, vegetation, and water were observed. In
best values for the hyperparameters, a search of all possible combina­ the case of vegetation, plastic, and oil show similar dispersion. The most
tions of predefined parameter values was performed, selecting those that marked difference between oil, plastic, and vegetation was observed in
maximize the score (Pedregosa et al., 2011). This search was conducted the SWIR bands. In this SWIR1 and SWIR2 bands, soil and water also
for the classification and concentration models using the grid search showed this behavior.
method (GridSearchCV) using the model_selection package of the Scikit-
learn 0.24.1 library (Pedregosa et al., 2011). 3.1.2. Correlations and clustering
The correlation of each datum was analyzed by Pearson's method and
2.3.5. Accuracy assessment plotted on a heatmap (Fig. 3). We noted 6747 observations with positive
Accuracy assessment, in ML models related to classification, is a correlation (r > 0), 8450 with a negative correlation (r < 0), and 28
fundamental part of evaluating the performance of the algorithms (Bar uncorrelated data (r = 0). Most of the positively correlated data
et al., 2020). In this study, the training and validation sets were used to belonged to the same material, such as the case water data that present
evaluate the accuracy of each proposed model. To evaluate the classi­ high values of positive correlation (r > 0.96) and are grouped together.
fication models, the accuracy score, balanced accuracy score, Cohen's Similar behavior is observed in some soil and vegetation data (r > 0.74,
Kappa, Precision-Recall and F1 score were calculated. Using these and r > 0.70, respectively), thus forming groups of data correlated with
metrics, models with higher scores are better than those with lower data of the same category.
scores. The calculation of all metrics used was performed with the metric There are samples correlated with samples from other classes, such
package of the Scikit-learn 0.24.1 library (Pedregosa et al., 2011). as plastic, oil, vegetation, and soil, which show positive correlations
For the linear regression model, and the ML model for concentration with water data (r > 0.70). Oil, plastic, soil, and vegetation data are
estimation, Mean Absolute Error (MAE), Mean Squared Error (MSE), correlated with each other (r > 0.70). Some samples from oil and plastic
Root Mean Squared Error (RMSE), R2 and Max Error metrics were used classes showed positive correlations with other materials (r > 0.75),
to evaluate the performance of the model, these metrics were calculated Fig. 3, although there are cases in which they are grouped within the
for the whole data set. For the ML model, we also calculate these metrics same class. In the case of negative correlations, it can be observed that
for the training and validation sets. Furthermore, to test the ML model vegetation and water data are the ones that present correlation values
for concentration estimation, a synthetic image (matrix) was created closer to -1.0. Also, this is observed in some oil data with water (r <
with six bands matching the Landsat wavelength ranges. The data were 0.80), plastic (r < 0.70), and soil data (r < 0.71). In the same way in
sorted from the highest to the lowest concentration so that the con­ plastic data with vegetation (r < 0.78) and soil data (r < 0.72).
centration gradient was evident. In the correlation analysis for the bands (Fig. 4), it was observed that
the SWIR bands formed a group (r = 0.90) and the visible bands formed

4
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 2. A) Reflectance distributions of the five materials studied. Statistical differences were obtained in the ANOVA analysis (p-value = 9.15e-140). According to the
Tukey post hoc analysis, similar materials were oil and soil (p-value = 6.05e-2), and soil and vegetation (p-value = 8.58e-01). B) The spectral response distributions of
each material in each band. Black represents the oil, yellow is plastic, brown is soil, green is vegetation, and blue is water. (For interpretation of the references to
color in this figure legend, the reader is referred to the web version of this article.)

another group (r > 0.88), the correlation coefficient values between from other classes. The oil and vegetation data showed the same di­
these groups presented low values (r < 0.68), the NIR band obtained low rection in the PCA space, and the soil data are found with plastic,
values in correlation with both the visible (r < 0.75) and SWIR bands (r vegetation, and oil data.
< 0.77). For the T-SNE analysis (Fig. 5B), the data showed greater clustering
PCA, and T-SNE plots were obtained to further analyze the clustering and separability (average silhouette score = 0.38) compared to PCA
of the data from the five classes (Fig. 5). For the PCA (Fig. 5A), using two (average silhouette score = 0.16), however, some oil data were near the
components, 90 % variability, it can be observed that there is no clear water and vegetation data. The plastic data were observed in a single
separation, as most of the data are found together despite belonging to group with a few exceptions close to water, vegetation, soil, and oil data.
other classes. The water data are the ones that remain grouped together The water data were the ones with the largest grouping and there were
with little overlap with the others. The plastic data showed great vari­ no data of this class close to others. Soil and vegetation data were closest
ability and dispersion in the PCA space, being found together with data to each other. In the two analyses, for this approach, we observed a poor

5
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 3. Correlation between all observations (175) of materials, 35 for each one. The color gradient corresponding to the correlation values of the data is observed.
Data with positive correlations (1.0) are shown with red colors, data with negative correlations (− 1.0) are shown in blue colors, and uncorrelated data (~0) are
shown in yellow colors. Classes for each sample are shown on column and row sides, black corresponds to oil class, yellow to plastic, brown to soil, green to
vegetation, and blue to water. The tree diagram present in the rows and columns corresponds to a dendrogram resulting from a hierarchical clustering representing
similarities or differences of each datum. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. The correlation matrix between bands. The color gradient corresponding to the correlation values of the data is observed. Data with values of correlation
>0.85 show dark blue colors, data with values between 0.75 and 0.80 show light blue colors, and data with values <0.70 show colors almost white. The bands used
are shown in the columns and rows, the gray colors correspond to the infrared bands, the red, green, and blue colors represent the RGB bands. The tree diagram
present on the rows and columns corresponds to a dendrogram resulting from a hierarchical clustering representing similarities or differences of each datum. (For
interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

separation of the data because some observations were assigned in other 3.1.3. Linear models of concentration data
clusters. However, the nonlinear method, T-SNE, showed better results A general linear model, constructed with the average reflectance of
than the linear method PCA. Therefore, models with nonlinear ap­ the six bands used, was used to estimate the oil concentration in the
proximations generate better results in the differentiation of materials. satellite images (Fig. 6A). The model showed a negative trend (slope =
− 270), where the concentration decreases as reflectance increases, i.e.,
data with higher concentrations have lower reflectance values, and

6
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 5. PCA and T-SNE analyses of the data analyzed for the five classes worked. A) PCA, only two components were used since together they contain 90 % of the
variability of the data. B) T-SNE, only two dimensions were used. In both graphics, the black dots and ellipses represent oil, yellow is plastic, brown is soil, green is
vegetation, and blue is water. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 6. The linear models for oil concentration data. The X-axis is the reflectance and the Y-axis is the concentration. A) General model, constructed with the average
reflectance of the six bands used, the slope for this model was statistically significant (p-value = 3.23e-11), with fit value R2 = 0.67, and negative trend (− 270). B)
The linear concentration model for each band used. A negative trend is observed for all bands, which is significant except for the SWIR2 band (p-value = 0.54). The R2
fit values, as well as the p-value and line equation, are shown in the graphs for each band.

those with lower concentrations have higher reflectance. 3.2. ML models


The negative trend in the general linear model (Fig. 6A) is still pre­
sent in the linear models of concentration and reflectance for each band The results of the ML models are presented in the following sections,
studied (Fig. 6B). For the SWIR2 band the linear model was not signif­ the first result is about the surface detection models using the six bands
icant (slope = − 85.6, p-value = 0.54), but for the other bands it was in the Landsat imagery, the second result is the detection of surfaces
(blue band p-value = 4.5e-16, green band p-value = 2.6e-14, red band p- with models generated with the important variables according to the
value = 5e-13, NIR band p-value = 2.9e-10, SWIR1 band p-value = Gini impurity method. Finally, the third result is about the oil concen­
0.004). The slope and R2 values were higher for the blue band (slope = tration estimation. For each result, we present the optimal values of the
486, R2 = 0.81), followed by green (slope = 301, R2 = 0.77), then red hyperparameters used, the accuracy-scores obtained, and the tests per­
(slope = 189, R2 = 0.73), NIR (slope = 121, R2 = 0.63), and SWIR1 formed with the selected models.
(slope = 128, R2 = 0.19).
According to the linear models, a negative linear relationship is 3.2.1. Surface detection models with six features
observed, which means that pure oil, or oil of higher concentration, has
the lowest reflectance values. However, a linear model does not fully 3.2.1.1. Accuracy scores. In the grid search of the hyperparameters used
express this behavior since it only explains 67 % of the variation in the for optimization of the models (Section 2.3.4), the values that maximize
data. Therefore, a linear model with statistical insights is not applied to each model were obtained; the optimal values according to the scores
the concentration prediction. are shown in Table S2. To evaluate the accuracy of the ML models, six
metrics, Accuracy, Balanced Accuracy, Kappa, Precision, Recall, and F1

7
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

scores, were used (Fig. 7). (blue). In the results of the four models, it was seen that the coastal area
Random Forest was the model with the highest performance in belonging to the Mississippi Delta was classified as vegetation (green),
predicting the different classes, with the highest values in all metrics for soil (brown) and some areas as oil (black), except in SVM RBF, which
both the training set (0.97, 0.97, 0.96, 0.97, 0.97, and 0.97) and vali­ classifies the area as mostly vegetation (green).
dation (0.90, 0.90, 0.87, 0.90, 0.90, and 0.90) followed by SVM RBF Considering that the main objective of the model is to identify oil
(0.93, 0.93, 0.91, 0.93, 0.93, and 0.93 for training set, and 0.88, 0.89, spills in water, although the Random Forest model obtained the best
0.85, 0.88, 0.88, and 0.88 for validation set), and KNN (0.91, 0.92, 0.89, scores (Section 3.2.1.1), the model that presented better approximations
0.91, 0.91, and 0.91 for training set, and 0.88, 0.89, 0.87, 0.88, 0.88, for detecting oil in the image was the KNN (Fig. 8) since the stain
and 0.88 for validation set). The model with the lowest values was SVM classified as oil (black) corresponds to the area of positive contrast,
(0.83, 0.85, 0.78, 0.83, 0.83, and 0.83) for the training set and (0.85, however it can be observed that the areas of the stain that present a
0.86, 0.81, 0.85, 0.85, and 0.85) for validation. Decision Tree, despite higher brightness are classified as soil (brown) since bare soil can pre­
having high values in the training set (0.96, 0.96, 0.95, 0.96, 0.96, and sent high brightness compared to other surfaces.
0.96) it obtained lower values in the validation data set (0.85, 0.86, Observing that the KNN model performed the best in the oil detection
0.81, 0.85, 0.85, and 0.85). In this sense, the latter model shows a large task, tests were performed with some Landsat 8 OLI images of oil-free
difference in the results obtained in the metrics between the training and areas (Fig. 9). In the first image, from January 12, 2014, which per­
validation sets, compared with the KNN model, where the differences in tains to the area near the Louisiana coast, the model did not detect oil, as
the metrics between both sets are closer to 0. the part of the Gulf of Mexico was classified as water (blue) by the
model. In the second image, December 08, 2017, which is in the Yucatan
3.2.1.2. Surface detection. Random Forest, SVM RBF, KNN, and Deci­ Peninsula, the same result was obtained, i.e., the part of the sea was
sion Tree models were tested with the May 9, 2010, spill image from the classified as water (blue) by the model.
DeepWater Horizon platform in the Gulf of Mexico (Fig. 8). The SVM The results of testing the model on images with and without oil
model was omitted because it was the model with the lowest values in corroborate the accuracy of the model. Such tests minimize the possi­
the metrics (Section 3.2.1.1). In the RGB image (Fig. 8), oil was present bility of false positives, as the model only succeeds in detecting oil in
and is seen in positive contrast with seawater without oil, i.e., the areas 2010 spill images. And in the non-oiled images, the model does not
where oil is present showed a gray coloration compared to water classify the other surfaces as oil.
without oil. Soil and vegetation were also known to be present in the
Mississippi Delta area, and evidently water in most of the scene, plastic 3.2.2. Surface detection models with important features
was the only material not known to be present in the scenes.
The classification result for the Random Forest model (Fig. 8) 3.2.2.1. Select features and accuracy scores. In the selection of the
assigned the plastic (yellow) and soil (brown) class to the area corre­ important variables using the method described in Section 2.3.2, it was
sponding to the oil slick, a similar result in the SVM RBF model (Fig. 8). found that the order of the variables according to their value of mean
The result of the KNN model (Fig. 8) classified the oil zone as oil (black) decrease in impurity (Fig. S2) was SWIR2 (0.22), NIR (0.22), SWIR1
and some zones as soil (brown). In the case of the Decision Tree model (0.19), Blue (0.17), Green (0.12), and Red (0.06). The Blue, NIR, SWIR1,
(Fig. 8) a part of the oil slick was classified as soil (brown). and SWIR2 bands are the most important in the separation of the data
The area of the image that corresponds to water shows that the SVM and differentiation of the five classes, so these were selected to create the
RBF and KNN models classify it within this same class (blue), the classification models, therefore a grid search was performed to optimize
Random Forest model classifies a part as vegetation (green) and another the hyperparameters. The optimal values obtained for each model are
as oil (black), and the Decision Tree model assigns a large part of the shown in Table S3.
scene to the oil class (black) and another portion to the water class To evaluate the performance of the models created with the

Fig. 7. Metrics calculated for the training and validation dataset, for the five models used. The blue line represents training data scores, and the orange line the
validation data scores. The highest values correspond to Random Forest, and the lowest to SVM. (For interpretation of the references to color in this figure legend, the
reader is referred to the web version of this article.)

8
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 8. ML model classification results for May 09,


2010, DeepWater Horizon spill image. In the first row
is a location map of the obtained Landsat 5TM image,
and the true-color composite of that image as well as
the location of where the DeepWater Horizon plat­
form was located (location marked with a black drop
and the letters DWH). In the second and third rows
are the results of the Random Forest, SVM RBF, KNN,
and Decision Tree classification models. Clouds are
shown in red, water in blue, oil in black, plastic in
yellow, vegetation in green, and soil in brown. (For
interpretation of the references to color in this figure
legend, the reader is referred to the web version of
this article.)

important variables, the same metrics were used for the models with all 0.78 for validation), which has a similar behavior to the model with all
variables (Section 3.2.1.1), which were Accuracy, Balanced accuracy, variables (Fig. 10).
Kappa, Precision, Recall and F1 scores. In the same way as described in
Section 3.2.1.1, Random Forest was the model that obtained the highest 3.2.2.2. Surface detection. The four best scoring models, that is
scores for training (0.98 in all metrics) and validation (0.94, 0.95, 0.94, described in Section 3.2.2.1, Random Forest, SVM RBF, KNN and Deci­
0.95, 0.95, and 0.95), followed by SVM RBF (0.91, 0.91, 0.89, 0.91, sion Tree, were tested for the May 09, 2010, scene. In Random Forest, it
0.91, and 0.91 for training dataset; and 0.87, 0.87, 0.83, 0.86, 0.86, and was observed that the oil spill slick was partially classified within the oil
0.87 for validation dataset) and KNN (0.89, 0.9, 0.86, 0.89, 0.89, and class (black), another part was classified as plastic (yellow) and soil
0.89 for training; and 0.88, 0.89, 0.85, 0.88, 0.88, and 0.88 for valida­ (brown). In this same model, we observed that the area belonging to the
tion dataset). The Decision Tree model, despite having high training Gulf of Mexico was classified as mostly plastic (yellow), the coastal area
values (0.98 in all metrics), the validation scores were low, and the was classified as oil (black) and plastic (yellow). The SVM RBF model
difference between the scores of both sets is large (between 0.08 and classified the oil slick as plastic (yellow), the Gulf of Mexico zone as
0.13) compared to the other models, e.g., the difference between the water (blue), and the coastal zone as plastic (yellow), oil (black), and soil
scores obtained in the metrics of the training and validation sets of the (brown) (Fig. 11).
KNN model is 0.01. Finally, there is the SVM model (0.77, 0.8, 0.72, In the case of the KNN model, we observed that the oil slick was
0.77, 0.77, and 0.77 for training; and 0.78, 0.81, 0.73, 0.78, 0.78, and classified in the oil class (black), the water-only part of the Gulf of

9
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 9. Landsat 8 OLI test images. The first image shows the location of the two images selected for testing. The upper right image (2014/01/12) corresponds to an
area near Louisiana's coast, a location like the images in Fig. 8. The lower right image (2017/12/08) corresponds to an area near the Yucatan Peninsula in Mexico. In
both images, the clouds (red) and the land were masked, so the latter is presented as seen in the RGB image. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)

Fig. 10. Metrics calculated for the training and validation dataset, for the five models used with the 4 selected variables. The blue line represents training data scores,
and the orange line the validation data scores. The highest values correspond to Random Forest, and the lowest to SVM. (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of this article.)

10
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 11. Results of the classification models trained


only with the parts of the spectrum belonging to the
Blue, NIR, SWIR1, and SWIR2 bands. The image
selected for this test was from Landsat 5TM on May 9,
2010, which was captured during the DeepWater
Horizon spill (the platform was located with a black
drop and the letters DWH). In the second and third
rows are the results of the Random Forest, SVM RBF,
KNN, and Decision Tree classification models. Clouds
are shown in red, water in blue, oil in black, plastic in
yellow, vegetation in green, and soil in brown. (For
interpretation of the references to color in this figure
legend, the reader is referred to the web version of
this article.)

Mexico was assigned to the water class (blue), and the coastal zone was 3.2.3. Concentration estimation model
classified as mostly vegetation (green). Finally, the Decision Tree model
classified the slick as plastic (yellow), a large part of the Gulf of Mexico 3.2.3.1. Hyperparameter values and accuracy scores. Random Forest re­
as vegetation (green) as well as the coast, only a part of the Gulf of gressor was performed, which was assigned the optimal values of 10, 2,
Mexico was classified as water (blue). 3, and 50 to the max_depth, max_features, min_samples_split, and n_es­
In the results of these classification models, using only the important timators parameters, respectively, by using the grid search. The scores to
variables, the model that obtained the best approximations in the analyze the performance of this model were calculated using the
identification of the oil slick in the Landsat image, despite not having training data, validation data, and using all data set (Table 1).
obtained the highest scores, was KNN. Therefore, the test was performed According to the R2 value of 0.90, the model generates good pre­
using this model on images without oil, using the same images as in dictions, compared to the general linear model with an R2 value of 0.67
Section 3.2.1.2. Here, similar results were obtained to the model trained (Section 3.1.3). In the same way MAE, MSE, RMSE and ME were less
with the six variables. That is, no oil was detected in the water in any than linear model metrics (Table 1).
image, the part corresponding to the Gulf of Mexico was assigned only to
the water class (blue) (Fig. 12). 3.2.3.2. Concentration estimation. The random forest model was used as
the final model for concentration estimation due of the scores obtained
compared with the linear model (Table 1). Therefore, two tests were

11
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 12. Landsat 8 OLI test images for the KNN model
trained only with Blue, NIR, SWIR1, and SWIR2
bands. The first image shows the location of the two
images selected for testing. The upper right image
(2014/01/12) corresponds to an area near Louisiana's
coast, a location like the images in Fig. 8. The lower
right image (2017/12/08) corresponds to an area
near the Yucatan Peninsula in Mexico. In both images,
the clouds (red) and the land were masked, so the
latter is presented as seen in the RGB image. (For
interpretation of the references to color in this figure
legend, the reader is referred to the web version of
this article.)

although the existence of variations in the spectral response values of the


Table 1
data (Fig. 13). In the 1 % concentration row, the model estimated a
Metrics and scores obtained in the performance evaluation of the Random Forest
concentration of 6.3 % in all cases, for the 23 % row the estimate was
regression model and statistical linear model. The first columns correspond to
Random Forest metrics, and the last column corresponds to a linear model. between 19 % and 35 %, but most of the values were in 22 % and 23 %
range. In the 40 % row of the original data, the estimate was found
Training Validation All data All data
between 35 % and 40 %, for the 60 % row the estimate was in the range
Random Forest Random Forest Random Forest linear model
of 60 % and 63 %, with one data of 52 %. For the 75 % row, the estimate
MAE 1.414 3.346 2.329 11.34
was identical to the original data, and the last row, the 92 % row, the
MSE 5.422 21.239 15.18 217.44
RMSE 2.329 4.609 3.896 14.74 estimate was 88 % for all pixels.
R2 0.99 0.97 0.977 0.67 In the second test of the model, the concentration estimation for the
ME 6.26 13.36 13.36 43.25 satellite images was performed. This estimation was calculated only if
the presence of oil was detected in the image, and only in the area
classified as oil in the surface detection model. For this test, the areas
performed, the first test of the model was performed using a synthetic
identified as oil by the KNN model for important variables from May 9
image created with all data sets (Section 2.3.5). It can be observed that
and July 12, images of the 2010 oil spill (Fig. 14). Fig. 14 shows the
the results obtained from the model prediction correspond to the way
segmented part of the zone corresponding to the oil class for the images
the data of the created image were ordered, from lower to higher con­
used as a test. In the last column, it can be observed the result of the
centration (concentration value in percentage within each pixel),
concentration estimation for this zone. Both test images show that the

Fig. 13. The concentration estimation model test using a synthetic image created with the original data. A) Blue band of the synthetic image is shown, within each
pixel is the original concentration value of the data. The data were organized in a set of bands like a satellite image. B) Result of the model prediction, within each
pixel is the estimated concentration value (%). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of
this article.)

12
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Fig. 14. Concentration estimation results using the Random Forest regressor. In the first column is the RGB image with the oil zoning identified by the KNN
classification model for important variables. The second column is the RGB image with the estimated concentration for the zone identified as oil. In both cases, the
clouds are masked and indicated in red. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 15. A) Transects plotted to observe the concentration profiles for the May 09 image. The blue lines represent the profiles, green squares represent de start of
profile and yellow ellipses are the end. B) Concentration estimation profile for transect 1. C) Concentration estimation profile for transect 2. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)

estimated concentration was between 40 % and 60 % for the areas returns to 0 % because the model did not detect oil in that part of the
identified as oil. image. In the second profile (Fig. 15A) the concentration estimated
When we made two transects in the area where the concentration using the model was between 35 % and 45 %, and later, approximately
was estimated for the May 09 image, we observed the variation in the in pixel 700, the concentration increases, and obtains values close to 70
concentration (Fig. 15A). For transect 1 (Fig. 15A), at the beginning, it %, and finally, in pixel 900, the concentration reaches 0 % because no oil
shows a concentration of 0 % because the classification model did not was detected at that site (Fig. 15C).
identify oil in that zone, later in the same profile, the model found a With the concentration values estimated for the images, oil stability
concentration between 40 % and 45 %, after this, the concentration over time was observed, since in the scenes the concentrations varied

13
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

between 30 % and 70 % approximately; in no case did the model esti­ behaviors are observed because each datum belongs to different sam­
mate concentrations lower or higher than this interval. ples, since no specific criterion was used in the selection of the data for
each class, they were chosen only because they belonged to them. In the
4. Discussion case of oil, it presents, in some cases, values like water, which can be due
to the variable content of this material in the samples, since the data
4.1. Materials differentiation belong to observations with different concentration values.

A general analysis of the spectral data for the five materials (Section
4.2. Surface classification precision
3.1.1) showed statistical differences in most of them, except oil and soil
(p-value = 0.06), and soil and vegetation (p-value = 0.85) (Fig. 2A).
In the first approach analyzed, which was to use the data corre­
However, this behavior by itself does not completely demonstrate the
sponding to the six bands, it was observed that the model with the best
differences between the materials because the distributions, dispersion,
approximations in the detection of oil was the KNN, despite id did not
range, and averages of each class are similar. For this reason, it is
obtain the best scores in the metrics (Section 3.2.1.1). However,
necessary to consider the spectral behavior of the data for each class in
confusion was still observed in the model when classifying a portion of
each band.
the stain as soil. This could be because when using all the bands, the
In the analysis per band (Fig. 2B), it can be observed a specific
model could not perform a good separation in the data, since the in­
behavior of each class, where the vegetation reflectance peak is in the
clusion of data from the visible spectrum can generate confusion (Lu
NIR band, the absorption of water is in the infrared bands, and the
et al., 2019; Winkelmann, 2005).
relatively low reflectance of the soil is in all bands with increases in the
After analyzing the importance of the variables, it was observed that
infrared part. For plastic, a wide distribution is observed in all bands,
the variables with the greatest influence on the separation of the data for
which be due to the data do not belong only to one type of plastic, but
each class were the infrared bands (NIR, SWIR1, and SWIR2), as well as
there are data for pet, light, dark and colored plastic, which, individu­
the blue band, which coincides with Fingas and Brown (2015), Leifer
ally, present differences, generating a wide distribution in all bands
et al. (2012), Lu et al. (2019), and Winkelmann (2005), who mentions
when taken as a whole.
that this spectral range is the best for separating oil from other materials.
Likewise, the spectral response of oil in each band can be observed,
The use of only these bands in the training and validation of the
when focusing on the visible part (RGB) it shows a higher reflectance
models showed, in general, that the scores of the models decreased a
than water, but it does not show specific absorption or reflection trends
little, and even so the Random Forest model was the one that obtained
due to the thin oil layers present a huge variation in their spectral
the best scores (Section 3.2.2.1). Also, with this alternative it was
response (Fingas and Brown, 2018). However, it is mentioned that oil
observed that the differences in the scores between the training and
reflectance tends to increase with decreasing wavelength, i.e. it is higher
validation data decreased, being the KNN model the one that showed
in the blue-green region (Fingas and Brown, 2017), but in this study was
differences close to 0 (Supplementary material Fig. S3A).
not completely demonstrated because in the red band the reflectance
After testing the models, it was observed that the KNN model was the
range was wider. The reason for this behavior may be that the data
one that had the best approximation in the detection of oil in water. The
analyzed not only belongs to oil at 100 % concentration, but there are
near-zero differences between the training and validation sets could
data with different concentrations and emulsions that affect its spectral
explain this result (Supplementary material Fig. S3A). Additionally,
response. Even so, it is possible to identify the absorption capacity of this
when performing the cross-validation analysis, this behavior is more
compound in the blue band (Leifer et al., 2012).
evident since the model variance in the metrics is lower compared to the
In other studies (Lu et al., 2019; Winkelmann, 2005), it has already
other ones (Supplementary material Fig. S3B). Also, we can observe that
been explained that the visible bands are not the most suitable to try
the Random Forest model, despite having the best scores, has a higher
differentiating oil from other materials, but rather the infrared bands
variance than KNN (Supplementary material Fig. S3B). These compari­
(NIR and SWIR) would be more appropriate, since the spectral charac­
sons and the results from sub-section surface detection (Fig. 11, quali­
teristics of the oil change in the NIR part (700–1100 nm). This change is
tatively ML model comparisons), indicate to us that the best model to
generated due to the oil thickness and the water-oil ratio because the
detect, quantitatively and qualitatively, oil spills is the KNN model.
water content in the stain can generate absorption at this wavelength,
Furthermore, we tested the KNN model on 8 images with oil (Sup­
modifying the reflectance values (Clark et al., 2010). Additionally,
plementary material Fig. S4) and 10 images without oil (Supplementary
carbon‑hydrogen bonds (C–H, C–H2, and C–H3), hydroxyl groups
material Fig. S5), which correspond to the spill that occurred in 2010. In
(OH), double and triple bonds of aliphatic and aromatic compounds,
this sense, it was obtained that in the 8 images the model could classify
carboxyl groups (C– – O), ester (C–O–C), amine groups (N–H) gives an
the spill stain in the oil class, and in the 10 images without oil, the model
specific spectral behavior for this part of the spectrum (720–1730 nm,
effectively did not identify oil in the water, in other words, it obtained
and 1750–1760 nm, and at 1190–1210 nm) (Leifer et al., 2012; Lu et al.,
100 % accuracy in identifying the oil in the images (Table 2 and
2019; Winkelmann, 2005), which is crucial for its differentiation.
Table S4).
Thus, it can be observed that oil, shows specific spectral properties in
In the images of the 2010 spill it can also be seen that some coastal
the visible and infrared part, showing absorption and reflection peaks,
areas were classified as oil, this could be because during the event, oil
which can generate dark, gray, shiny, opaque, and even rainbow col­
was present in the Louisiana coast and Mississippi deltas (Beyer et al.,
orations due to the variation in the incidence of light (Bonn Agreement
Accord de Bonn (BAOAC), 2007; Lu et al., 2019). This is the reason why
all bands should be incorporated into a more complete analysis, and to Table 2
Confusion matrix was performed with the classification results of the KNN model
achieve greater separability.
for the images with oil and without oil. The model obtains 100 % accuracy for
When continuing analyzing the distribution in each band (Fig. 2),
both cases.
there are data with similar dispersion since the quartiles overlap, which
Prediction
is an indication that there are observations like other materials. This is
clearly observed when analyzing individually the observations of each Images of oil Images with no oil
class (Section 3.1.2), where it is possible to identify those observations detected detected

that correlate with data from other classes (Fig. 3), and that, therefore, Observation Images with oil 8/8 0
are grouped together (Fig. 5A). Even using a non-linear method Images without
0 10 / 10
oil
(Fig. 5B), this behavior is observed, although to a lesser extent. These

14
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

2016; Garcia-Pineda et al., 2017; Thyng, 2019). et al. (2015), and sometimes present spectral characteristics similar to
plastic, as mentioned by Hörig et al. (2001).
Also, in the detection of oil in optical images, it is important to
4.3. Oil detection potential in optical imagery consider that areas with high contrast can be mistaken for oil slicks due
to the colorations present in it (Bonn Agreement Accord de Bonn
In oil spills, environmental factors affect the thickness and concen­ (BAOAC), 2007; Lu et al., 2020, 2019). This is an aspect that must be
tration of the oil, making the detection of oil spills a very difficult task considered when obtaining and processing the image, since if there is
(Clark et al., 2010; Fingas and Brown, 2018; Leifer et al., 2012). These cloudiness or low wind speed in the scene, there will be areas with high
factors can provoke a constant weathering process, and for this, the oil contrast that can be identified as oil, as mentioned by Leifer et al.
can be found in water in oil (WO) and oil in water (OW) emulsions (2012).
(Leifer et al., 2012; Lu et al., 2020), which do not generate positive
contrasts with water and therefore not be detectable as they do not
present contrasts with water (Fig. 16A, B). 4.4. Spectral data and optical imaging for concentration estimation
In the study by Lu et al. (2020), for the image of May 25, some areas
were identified with WO and OW emulsions through a combination of Linear models (Section 3.1.3), general (Fig. 6A) and by band
false-color bands (SWIR1, NIR, RED), which have different colors (Fig. 6B), were developed to analyze the relationship between reflec­
(Fig. 16C). In our study, the KNN classification model of important tance and concentration, showed that the higher the concentration, the
features detected both emulsions and assigned them the oil class lower the reflectance. Previous studies have mentioned that having data
(Fig. 16D), however, in both cases the stain observed in the false-color from oil samples with different concentrations, linear and nonlinear
image is not completely detected. Note that, for our case, no distinc­ models can be developed to relate the reflectance to concentration, and
tion was made between emulsions because, for the detection part, the thus predict the hydrocarbon degradation process (Daling et al., 2014,
main objective was to detect the oil and differentiate it from other 1990; Daling and StrØm, 1999; Lu et al., 2019). In our case, using the ML
materials. model to estimate the concentration, instead of a conventional linear
In this sense, the partial identification made with our model may be model, allowed us to increase the certainty in the estimation process
because oil heterogeneity caused by weathering processes, not only af­ (Section 3.2.3.1).
fects the visual detection of the stain but also causes that degraded or In the results obtained in the concentration estimation (Section
emulsified oil slicks can be confused with water and even organic ma­ 3.2.3.2), it was observed that the estimated concentration for the four
terials, as described in Fingas and Brown (2017), Leifer et al. (2012), Sun images fluctuated between 40 % and 60 % (Fig. 14). When profiles are

Fig. 16. A) True color image of May 25, 2010, white boxes correspond to the areas where emulsions are present, as reported by Lu et al. (2020). B) True color images
of the approach to WO and OW emulsions. C) False color images (SWIR1, NIR, R) of the approach to the zones with WO and OW emulsions. D) Result of the
classifications with KNN for important features, both emulsions were classified as oil, however, they were not detected in their entirety.

15
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

made, this fluctuation can be observed in the estimated concentration 4.5. Considerations
along both lines drawn in the image (Fig. 15B and C). This fluctuation in
concentration, observed in the profiles, is an expected behavior, since Although this study demonstrates the possibility of detecting oil
the higher the concentration, the lower the reflectance and vice versa, in slicks in optical images using spectral response data and ML algorithms,
other words, reflectance and concentration have an inverse relationship, and the implementation of the most important features to differentiate
which is consistent with that mentioned in Lu et al. (2019). the materials, there are certain aspects that must be considered to
In the graphs (Fig. 17A) corresponding to transect 1 (Fig. 15B), low implement a model capable of correctly differentiating oil from other
reflectance values are observed at the beginning because the profile materials. It should be considered that oil is constantly exposed to
starts where no oil was found, and which was classified as water by the environmental factors that promote its degradation (Daling et al., 2014;
model, therefore these values correspond to a behavior like water, which Daling and StrØm, 1999). Also, the proximity to the coasts, and sedi­
is clearer in the SWIR1 and SWIR2 bands with values close to 0, since ment or sludge flows near the source of the spill, mean that oil can be
water absorbs the greatest amount of electromagnetic radiation in these combined with other materials, generating in some cases mineral ag­
areas of the spectrum. And when oil is detected, an abrupt increase in gregates, or the sinking of heavy parts of oil (Daly et al., 2016). These
reflectance values is observed. degradation and combination phenomena generate a large heterogene­
Therefore, with the ML model, trained using oil data with different ity in the spectral response values of the oil, making differentiating the
concentrations, it generates a good approximation in the concentration oil slick from other materials a complicated task. This can be remedied
estimation. However, this is a first approximation, which indicates that by increasing the number of observations of oil and other materials, to
this model can be improved. achieve greater separability. Also, can be improved by adding different
types of oils and by having satellite images properly segmented, by
material class. And if possible, to have in situ measurements that allow

Fig. 17. Plots of the reflectance values of the profiles from Fig. 15A for each band used. Only the May 09 image is used. The green square represents the start, and the
yellow ellipse represents the end of the profile in the image A) Corresponds to transect 1 in Fig. 15A. B) corresponds to transect 2 in Fig. 15A. (For interpretation of
the references to color in this figure legend, the reader is referred to the web version of this article.)

16
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

the verification of the results. interests or personal relationships that could have appeared to influence
In the concentration estimation part, constant weathering of the oil the work reported in this paper.
must be considered since phenomena can occur overnight, generating
variations at the sub-pixel level, making it complicated to track the Data availability
evolution of oil slick (Svejkovsky et al., 2016), in addition the satellites
have the disadvantage of providing outdated information due to their No data was used for the research described in the article.
low temporal resolution (Garcia-Pineda et al., 2013). However, it has
been pointed out that if the oil is well characterized and the environ­ Acknowledgements
mental conditions of wind speed, sea state, currents, salinity, tempera­
ture, and solar insolation are known, it should be possible to calculate This work was supported by the Consejo Nacional de Ciencia y
the rates of many of these processes and thus establish how the condition Tecnología (CONACYT, https://conacyt.mx/), through grant CVU:
of the oil changes with time (Daling et al., 1990). Additionally, it should 699765. We thank the División de Geociencias Aplicadas (https://ipicyt.
be considered that, as in the classification methods, there may be bands edu.mx/Geociencias_Aplicadas/areas_geociencias_aplicadas.php) and
that generate noise or other ones that provide relevant information to the Centro Nacional de Supercómputo (https://cns.ipicyt.edu.mx/) of
improve the model estimates. the Instituto Potosino de Investigación Científica y Tecnológica A.C.
These issues make the task of oil detection, as well as its monitoring (https://www.ipicyt.edu.mx/index.php) for the computational grant
over time, a complex task. Oil spill detection remote sensing is a field TKII-R2018-COV1, also to the Alianza en Inteligencia Artificial for their
that needs to be studied and expanded. However, this work provides a support (https://www.consorcioia.mx/). And special thanks to Erandi
perspective on the application of ML algorithms trained with spectral Monterrubio-Martínez for all her support in the revision of this docu­
response data in the detection of oil in optical images, covering the ment for the language improvement.
visible and infrared spectrum, which can be used to support the response
and attention to future oil spills.
Appendix A. Supplementary data
5. Conclusions
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.marpolbul.2022.114132.
Spectral response data (from laboratory techniques, field spectros­
copy and satellite data) are helpful for detecting surfaces in optical
imaging. However, various weathering and mixing processes, as well as References
physical and chemical properties, promote some data to resemble other
Al-Ruzouq, R., Gibril, M.B.A., Shanableh, A., Kais, A., Hamed, O., Al-Mansoori, S.,
surfaces. Using statistical analysis to study the spectral data of the sur­ Khalil, M.A., 2020. Sensors, features, and machine learning for oil spill detection and
faces provided a great perspective to understand and explore their monitoring: a review. Remote Sens. https://doi.org/10.3390/rs12203338.
behavior. Due to its nature, the materials showed characteristic behavior Arslan, N., 2018. Assessment of oil spills using sentinel 1 C-band SAR and Landsat 8
multispectral sensors. Environ. Monit. Assess. 190, 637. https://doi.org/10.1007/
in certain ranges of the electromagnetic spectrum, which helped to s10661-018-7017-4.
discriminate them. The generated ML models, for all bands obtained Baldridge, A.M., Hook, S.J., Grove, C.I., Rivera, G., 2009. The ASTER spectral library
good scores; however, they could not fully discriminate the surfaces in version 2.0. Remote Sens. Environ. 113, 711–715. https://doi.org/10.1016/j.
rse.2008.11.007.
satellite images. By selecting the Blue, NIR, SWIR1, and SWIR2 bands, Balogun, A.-L., Yekeen, S.T., Pradhan, B., Althuwaynee, O.F., 2020. Spatio-temporal
the differences between the training and validation values decreased, analysis of oil spill impact and recovery pattern of coastal vegetation and wetland
thus improving the models. In this study, the model with the best ap­ using multispectral satellite Landsat 8-OLI imagery and machine learning models.
Remote Sens. https://doi.org/10.3390/rs12071225.
proximations for oil detection, quantitatively and qualitatively, was Bar, S., Parida, B.R., Pandey, A.C., 2020. Landsat-8 and Sentinel-2 based Forest fire burn
KNN for the Blue, NIR, SWIR1, and SWIR2 bands. This model showed area mapping using machine learning algorithms on GEE cloud platform over
good precision in oil detection, since only identified oil in images with Uttarakhand, Western himalaya. Remote Sens. Appl. Soc. Environ. 18, 100324
https://doi.org/10.1016/j.rsase.2020.100324.
the presence of oil, i.e., true positives, and in images with the absence of
Bessis, J.-L., 2003. International Charter “Space and Major Disasters” Evolution. https://
oil the model did not identify it. In the case of oil concentration esti­ doi.org/10.2514/6.iac-03-c.2.01.
mation, using the spectral response data of oil with different concen­ Beyer, J., Trannum, H.C., Bakke, T., Hodson, P.V., Collier, T.K., 2016. Environmental
trations is possible to generate oil concentration estimations in satellite effects of the Deepwater horizon oil spill: a review. Mar. Pollut. Bull. 110, 28–51.
https://doi.org/10.1016/j.marpolbul.2016.06.027.
images. There is a certain degree of uncertainty in the estimates due to Bonn Agreement Accord de Bonn (BAOAC), 2007. Bonn Agreement Oil Appearance
the existence of many physical and chemical phenomena that alter the Code.
composition of the oil in the real environment. For both supervised Burgherr, P., 2007. In-depth analysis of accidental oil spills from tankers in the context of
global spill trends from all sources. J. Hazard. Mater. 140, 245–256. https://doi.org/
learning models generated, detection of oil (classification of surfaces) 10.1016/j.jhazmat.2006.07.030.
and concentration estimation (regression), it is convenient to increase Cantorna, D., Dafonte, C., Iglesias, A., Arcay, B., 2019. Oil spill segmentation in SAR
the number of example data to obtain better models. The use of spectral images using convolutional neural networks. A comparative analysis with clustering
and logistic regression algorithms. Appl. Soft Comput. 84, 105716 https://doi.org/
response data, the selection of important variables and ML techniques, 10.1016/j.asoc.2019.105716.
and the results of our methodology increase the applicability of optical Cao, Z., Ma, R., Duan, H., Pahlevan, N., Melack, J., Shen, M., Xue, K., 2020. A machine
satellite imagery in oil spill detection. learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland
lakes. Remote Sens. Environ. 248, 111974 https://doi.org/10.1016/j.
rse.2020.111974.
CRediT authorship contribution statement Chander, G., Markham, B.L., Helder, D.L., Ali, E., 2009. Remote sensing of environment
summary of current radiometric calibration coef fi cients for Landsat MSS , TM , ETM
+ , and EO-1 ALI sensors. Remote Sens. Environ. 113, 893–903. https://doi.org/
Trujillo-Acatitla Rubicel: Method, Research, Data analysis, Writing
10.1016/j.rse.2009.01.007.
- original draft, Writing - revision and editing. Tuxpan-Vargas Jose: Chen, B., Ye, X., Zhang, B., Jing, L., Lee, K., 2019. Chapter 22 - marine oil
Research, Writing - revision and editing, Project management. Ovando- spills—preparedness and countermeasures. In: Sheppard, C.B.T.-W.S. an E.E. (Ed.).
Vázquez Cesare: Research, Data analysis, Method, Writing - proof­ Academic Press, pp. 407–426. https://doi.org/10.1016/B978-0-12-805052-
1.00025-5.
reading and editing, Project management. Chowdhury, S., Evans, C., Shipman, T.C., 2021. In: Singhroy, V. (Ed.), The Role of
Landsat-8 Multispectral Data in Spill Response: Three Case Studies BT - Advances in
Declaration of competing interest Remote Sensing for Infrastructure Monitoring. Springer International Publishing,
Cham, pp. 291–305. https://doi.org/10.1007/978-3-030-59109-0_13.
Clark, R.N., Swayze, G.A., Leifer, I., Livo, K.E., Kokaly, R.F., Hoefen, T., Lundeen, S.,
The authors declare that they have no known competing financial Eastwood, M., Green, R.O., Pearson, N., Sarture, C., McCubbin, I., Roberts, D.,

17
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

Bradley, E., Steele, D., Ryan, T., Dominguez, R., 2010. A method for quantitative Konik, M., Bradtke, K., 2016. Object-oriented approach to oil spill detection using
mapping of thick oil spills using imaging spectroscopy. Open-File Rep. https://doi. ENVISAT ASAR images. ISPRS J. Photogramm. Remote Sens. 118, 37–52. https://
org/10.3133/ofr20101167. doi.org/10.1016/j.isprsjprs.2016.04.006.
Cococcioni, M., Corucci, L., Masini, A., Nardelli, F., 2012. SVME: an ensemble of support Krijthe, J.H., 2015. {Rtsne}: T-Distributed Stochastic Neighbor Embedding Using Barnes-
vector machines for detecting oil spills from full resolution MODIS images. Ocean Hut Implementation.
Dyn. 62, 449–467. https://doi.org/10.1007/s10236-011-0510-8. Kubat, M., Holte, R.C., Matwin, S., 1998. Machine learning for the detection of oil spills
CONAE (Comision Nacional de Actividades Espaciales) , n.d. Biblioteca de Firmas in satellite radar images. Mach. Learn. 30, 195–215. https://doi.org/10.1023/A:
Espectrales de CONAE. 1007452223027.
Daling, P.S., StrØm, T., 1999. Weathering of oils at sea: model/field data comparisons. Lary, D.J., Zewdie, G.K., Liu, X., Wu, D., Levetin, E., Allee, R.J., Malakar, N., Walker, A.,
Spill Sci. Technol. Bull. 5, 63–74. https://doi.org/10.1016/S1353-2561(98)00051-6. Mussa, H., Mannino, A., Aurin, D., 2018. In: Mathieu, P.-P., Aubrecht, C. (Eds.),
Daling, P.S., Brandvik, P.J., Mackay, D., Johansen, O., 1990. Characterization of crude Machine Learning Applications for Earth Observation BT - Earth Observation Open
oils for environmental purposes. Oil Chem. Pollut. 7, 199–224. https://doi.org/ Science and Innovation. Springer International Publishing, Cham, pp. 165–218.
10.1016/S0269-8579(05)80027-9. https://doi.org/10.1007/978-3-319-65633-5_8.
Daling, P.S., Leirvik, F., Almås, I.K., Brandvik, P.J., Hansen, B.H., Lewis, A., Reed, M., Leifer, I., Lehr, W.J., Simecek-Beatty, D., Bradley, E., Clark, R., Dennison, P., Hu, Y.,
2014. Surface weathering and dispersibility of MC252 crude oil. Mar. Pollut. Bull. Matheson, S., Jones, C.E., Holt, B., Reif, M., Roberts, D.A., Svejkovsky, J.,
87, 300–310. https://doi.org/10.1016/j.marpolbul.2014.07.005. Swayze, G., Wozencraft, J., 2012. State of the art satellite and airborne marine oil
Daly, K.L., Passow, U., Chanton, J., Hollander, D., 2016. Assessing the impacts of oil- spill remote sensing: application to the BP Deepwater horizon oil spill. Remote Sens.
associated marine snow formation and sedimentation during and after the Environ. 124, 185–209. https://doi.org/10.1016/j.rse.2012.03.024.
Deepwater horizon oil spill. Anthropocene 13, 18–33. https://doi.org/10.1016/j. Li, P., Cai, Q., Lin, W., Chen, B., Zhang, B., 2016. Offshore oil spill response practices and
ancene.2016.01.006. emerging challenges. Mar. Pollut. Bull. 110, 6–27. https://doi.org/10.1016/j.
De Kerf, T., Gladines, J., Sels, S., Vanlanduit, S., 2020. Oil spill detection using machine marpolbul.2016.06.020.
learning and infrared images. Remote Sens. https://doi.org/10.3390/rs12244090. Lin, Y., Li, L., Yu, J., Hu, Y., Zhang, T., Ye, Z., Syed, A., Li, J., 2021. An optimized
de Mendiburu, F., 2020. agricolae: Statistical Procedures for Agricultural Research. machine learning approach to water pollution variation monitoring with time-series
De Padova, D., Mossa, M., Adamo, M., De Carolis, G., Pasquariello, G., 2017. Synergistic Landsat images. Int. J. Appl. Earth Obs. Geoinf. 102, 102370 https://doi.org/
use of an oil drift model and remote sensing observations for oil spill monitoring. 10.1016/j.jag.2021.102370.
Environ. Sci. Pollut. Res. 24, 5530–5543. https://doi.org/10.1007/s11356-016- Liu, B., Li, Y., Chen, P., Zhu, X., 2016. Extraction of oil spill information using decision
8214-8. tree based minimum noise fraction transform. J. Indian Soc. Remote Sens. 44,
Distribution, A.S., 2020. Anaconda Software Distribution. Anaconda Doc. 421–426. https://doi.org/10.1007/s12524-015-0499-4.
Everitt, B., 1992. Book reviews : Chambers JM, Hastie TJ eds 1992: Statisti cal models in Lu, Y., Shi, J., Wen, Y., Hu, C., Zhou, Y., Sun, S., Zhang, M., Mao, Z., Liu, Y., 2019.
S. California: Wadsworth and Brooks/Cole. ISBN 0 534 16765-9. Stat. Methods Med. Optical interpretation of oil emulsions in the ocean – part I: laboratory
Res. 1, 220–221. https://doi.org/10.1177/096228029200100208, 4th edn. measurements and proof-of-concept with AVIRIS observations. Remote Sens.
Fingas, M., Brown, C.E., 2017. Chapter 5 - oil spill remote sensing. In: Fingas, M.B.T.-O.S. Environ. 230, 111183 https://doi.org/10.1016/j.rse.2019.05.002.
S. and T. (Ed.), Second E. Gulf Professional Publishing, Boston, pp. 305–385. https:// Lu, Y., Shi, J., Hu, C., Zhang, M., Sun, S., Liu, Y., 2020. Optical interpretation of oil
doi.org/10.1016/B978-0-12-809413-6.00005-9. emulsions in the ocean – part II: applications to multi-band coarse-resolution
Fingas, M., Brown, C.E., 2018. A review of oil spill remote sensing. Sensors. https://doi. imagery. Remote Sens. Environ. 242, 111778 https://doi.org/10.1016/j.
org/10.3390/s18010091. rse.2020.111778.
Fingas, M., Brown, E.C., 2015. Oil spill remote sensing. In: Fingas, M. (Ed.), Handbook of Luciani, R., Laneve, G., 2018. Oil Spill Detection Using Optical Sensors: A Multi-
Oil Spill Science and Technology. John Wiley & Sons, Inc. Temporal Approach. In: Satell. Oceanogr. Meteorol. https://doi.org/10.18063/som.
Fisher, R.A., 1992. Statistical methods for research workers. In: Breakthroughs in v0i0.816.
Statistics. Springer, pp. 66–70. Maechler, M., Struyf, A., Hubert, M., Hornik, K., Studer, M., Roudier, P., 2015. Package
Freedman, D., Pisani, R., Purves, R., 2007. In: Pisani, R. Purves (Ed.), Statistics, ‘Cluster’: Cluster Analysis Basics and Extensions. R Top. Doc.
international student edition. WW Nort. Company, New York. 4th edn. McKinney, W., 2010. Data structures for statistical computing in python. In: van der
Garcia-Pineda, O., MacDonald, I., Hu, C., Svejkovsky, J., Hess, M., Dukhovskoy, D., Walt, S., Millman, J. (Eds.), Proceedings of the 9th Python in Science Conference,
Morey, S., 2013. Detection of floating oil anomalies from the Deepwater horizon oil pp. 56–61. https://doi.org/10.25080/Majora-92bf1922-00a.
spill with synthetic aperture radar. Oceanography 26. https://doi.org/10.5670/ Meerdink, S.K., Hook, S.J., Roberts, D.A., Abbott, E.A., 2019. The ECOSTRESS spectral
oceanog.2013.38. library version 1.0. Remote Sens. Environ. 230, 111196 https://doi.org/10.1016/j.
Garcia-Pineda, O., Holmes, J., Rissing, M., Jones, R., Wobus, C., Svejkovsky, J., Hess, M., rse.2019.05.015.
2017. Detection of oil near shorelines during the Deepwater horizon oil spill using Mitchell, T., 1997. Machine learning, machine learning. McGraw-Hill Science/
synthetic aperture radar (SAR). Remote Sens. https://doi.org/10.3390/rs9060567. Engineering/Math. https://doi.org/10.1007/BF00116892.
GDAL/OGR Contributors, 2021. {GDAL/OGR} Geospatial Data Abstraction Software Mohammadi, M., Sharifi, A., Hosseingholizadeh, M., Tariq, A., 2021. Detection of oil
Library. https://doi.org/10.5281/zenodo.5884351. pollution using SAR and optical remote sensing imagery: a case study of the Persian
Ghorbanian, A., Kakooei, M., Amani, M., Mahdavi, S., Mohammadzadeh, A., Gulf. J. Indian Soc. Remote Sens. https://doi.org/10.1007/s12524-021-01399-2.
Hasanlou, M., 2020. Improved land cover map of Iran using sentinel imagery within Mohammadiun, S., Hu, G., Gharahbagh, A.A., Li, J., Hewage, K., Sadiq, R., 2021.
Google earth engine and a novel automatic workflow for land cover classification Intelligent computational techniques in marine oil spill management: a critical
using migrated training samples. ISPRS J. Photogramm. Remote Sens. 167, 276–288. review. J. Hazard. Mater. 419, 126425 https://doi.org/10.1016/j.
https://doi.org/10.1016/j.isprsjprs.2020.07.013. jhazmat.2021.126425.
Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Ozigis, M.S., Kaduk, J.D., Jarvis, C.H., 2019. Mapping terrestrial oil spill impact using
Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., machine learning random forest and Landsat 8 OLI imagery: a case site within the
Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M., Niger Delta region of Nigeria. Environ. Sci. Pollut. Res. 26, 3621–3635. https://doi.
Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., org/10.1007/s11356-018-3824-y.
Abbasi, H., Gohlke, C., Oliphant, T.E., 2020. Array programming with {NumPy}. Pearson, K., 1901. LIII. On lines and planes of closest fit to systems of points in space.
Nature 585, 357–362. https://doi.org/10.1038/s41586-020-2649-2. London, Edinburgh, Dublin Philos. Mag. J. Sci. 2, 559–572.
Hooper, C.H., 1982. The IXTOC I oil spill : the federal scientific response. In: NOAA Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Special Report. Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Hörig, B., Kühn, F., Oschütz, F., Lehmann, F., 2001. HyMap hyperspectral remote sensing Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: machine
to detect hydrocarbons. Int. J. Remote Sens. 22, 1413–1422. https://doi.org/ learning in python. J. Mach. Learn. Res. 12, 2825–2830.
10.1080/01431160120909. QGIS Development Team, O., 2018. QGIS Geographic Information System.
Hu, C., Feng, L., Holmes, J., Swayze, G.A., Leifer, I., Melton, C., Garcia, O., Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of
MacDonald, I., Hess, M., Muller-Karger, F., Graettinger, G., Green, R., 2018. Remote cluster analysis. J. Comput. Appl. Math. 20, 53–65. https://doi.org/10.1016/0377-
sensing estimation of surface oil volume during the 2010 Deepwater horizon oil 0427(87)90125-7.
blowout in the Gulf of Mexico: scaling up AVIRIS observations with MODIS Simon, H.A., 1996. In: Edit, Third (Ed.), The Sciences of the Artificial. MIT Press, London,
measurements. J. Appl. Remote. Sens. 12, 1. https://doi.org/10.1117/1. England, Cambridge, MA. https://doi.org/10.1016/S0898-1221(97)82941-0.
JRS.12.026008. Singh, R.K., Singh, P., Drews, M., Kumar, P., Singh, H., Gupta, A.K., Govil, H., Kaur, A.,
Hunter, J.D., 2007. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95. Kumar, M., 2021. A machine learning-based classification of LANDSAT images to
https://doi.org/10.1109/MCSE.2007.55. map land use and land cover of India. Remote Sens. Appl. Soc. Environ. 24, 100624
Ivshina, I.B., Kuyukina, M.S., Krivoruchko, A.V., Elkin, A.A., Makarov, S.O., https://doi.org/10.1016/j.rsase.2021.100624.
Cunningham, C.J., Peshkur, T.A., Atlas, R.M., Philp, J.C., 2015. Oil spill problems Singha, S., Bellerby, T.J., Trieschmann, O., 2013. Satellite oil spill detection using
and sustainable response strategies through new technologies. Environ. Sci. Process. artificial neural networks. IEEE JSel. Top. Appl. Earth Obs. Remote Sens. 6,
Impacts 17, 1201–1219. https://doi.org/10.1039/C5EM00070J. 2355–2363. https://doi.org/10.1109/JSTARS.2013.2251864.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Sun, S., Hu, C., Tunnell, J.W., 2015. Surface oil footprint and trajectory of the ixtoc-I oil
Learning, Springer Texts in Statistics. Springer, New York, New York, NY. https:// spill determined from Landsat/MSS and CZCS observations. Mar. Pollut. Bull. 101,
doi.org/10.1007/978-1-4614-7138-7. 632–641. https://doi.org/10.1016/j.marpolbul.2015.10.036.
Kokaly, R.F., Clark, R.N., Swayze, G.A., Livo, K.E., Hoefen, T.M., Pearson, N.C., Wise, R. Svejkovsky, J., Hess, M., Muskat, J., Nedwed, T.J., McCall, J., Garcia, O., 2016.
A., Benzel, W.M., Lowers, H.A., Driscoll, R.L., Klein, A.J., 2017. USGS Spectral Characterization of surface oil thickness distribution patterns observed during the
Library Version 7. Reston, VA, Data Series. https://doi.org/10.3133/ds1035. Deepwater horizon (MC-252) oil spill with aerial and satellite remote sensing. Mar.
Pollut. Bull. 110, 162–176. https://doi.org/10.1016/j.marpolbul.2016.06.066.

18
R. Trujillo-Acatitla et al. Marine Pollution Bulletin 184 (2022) 114132

R Core Team, 2021. R: A Language and Environment for Statistical Computing. Winkelmann, K.H., 2005. Onthe Applicability of Imaging Spectrometry for the Detection
Temitope Yekeen, S., Balogun, A.-L., 2020. Advances in remote sensing technology, and Investigation ofContaminated Sites With Particular Consideration Given to the
machine learning and deep learning for marine oil spill detection, prediction and Detection of FuelHydrocarbon Contaminants in Soil.
vulnerability assessment. Remote Sens. 12. https://doi.org/10.3390/rs12203416. Wolanin, A., Camps-Valls, G., Gómez-Chova, L., Mateo-García, G., van der Tol, C.,
Thyng, K.M., 2019. Deepwater horizon oil could have naturally reached Texas beaches. Zhang, Y., Guanter, L., 2019. Estimating crop primary productivity with Sentinel-2
Mar. Pollut. Bull. 149, 110527 https://doi.org/10.1016/j.marpolbul.2019.110527. and landsat 8 using machine learning methods trained with radiative transfer
Topouzelis, K., Psyllos, A., 2012. Oil spill feature selection and classification using simulations. Remote Sens. Environ. 225, 441–457. https://doi.org/10.1016/j.
decision tree forest on SAR image data. ISPRS J. Photogramm. Remote Sens. 68, rse.2019.03.002.
135–143. https://doi.org/10.1016/j.isprsjprs.2012.01.005. Yang, Z., Chen, Z., Lee, K., Owens, E., Boufadel, M.C., An, C., Taylor, E., 2021. Decision
Topouzelis, K., Tarchi, D., Vespe, M., Posada, M., Muellenhoff, O., Ferraro, G., 2015. support tools for oil spill response (OSR-DSTs): approaches, challenges, and future
Detection, tracking, and remote sensing: satellites and image processing (Spaceborne research perspectives. Mar. Pollut. Bull. 167, 112313 https://doi.org/10.1016/j.
oil spill detection). In: Fingas, M. (Ed.), Handbook of Oil Spill Science and marpolbul.2021.112313.
Technology. John Wiley & Sons Inc. Zhou, T., Geng, Y., Ji, C., Xu, X., Wang, H., Pan, J., Bumberger, J., Haase, D., Lausch, A.,
Tukey, J.W., 1977. Exploratory data analysis. In: Reading. MA. 2021. Prediction of soil organic carbon and the C: N ratio on a national scale using
van der Maaten, L., Hinton, G., 2012. Visualizing non-metric similarities in multiple machine learning and satellite data: a comparison between Sentinel-2, Sentinel-3
maps. Mach. Learn. 87, 33–55. https://doi.org/10.1007/s10994-011-5273-4. and Landsat-8 images. Sci. Total Environ. 755, 142661 https://doi.org/10.1016/j.
Van Rossum, G., Drake, F.L., 2009. Python 3 Reference Manual. CreateSpace, Scotts scitotenv.2020.142661.
Valley, CA. Zhu, Z., Woodcock, C.E., 2012. Object-based cloud and cloud shadow detection in
Verrelst, J., Muñoz, J., Alonso, L., Delegido, J., Rivera, J.P., Camps-Valls, G., Moreno, J., landsat imagery. Remote Sens. Environ. 118, 83–94. https://doi.org/10.1016/j.
2012. Machine learning regression algorithms for biophysical parameter retrieval: rse.2011.10.028.
opportunities for Sentinel-2 and -3. Remote Sens. Environ. 118, 127–139. https:// Zhu, Z., Wang, S., Woodcock, C.E., 2015. Improvement and expansion of the fmask
doi.org/10.1016/j.rse.2011.11.002. algorithm: cloud, cloud shadow, and snow detection for landsats 4–7, 8, and sentinel
Warnes, G.R., Bolker, B., Bonebakker, L., Gentleman, R., Huber, W., Liaw, A., Lumley, T., 2 images. Remote Sens. Environ. 159, 269–277. https://doi.org/10.1016/j.
Maechler, M., Magnusson, A., Moeller, S., Schwartz, M., Venables, B., 2020. gplots: rse.2014.12.014.
Various R Programming Tools for Plotting Data.

19

You might also like