Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

remote sensing

Communication
Model Specialization for the Use of ESRGAN on Satellite and
Airborne Imagery
Étienne Clabaut 1, *, Myriam Lemelin 1 , Mickaël Germain 1 , Yacine Bouroubi 1 and Tony St-Pierre 2

1 Département de Géomatique Appliquée, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada;


myriam.lemelin@usherbrooke.ca (M.L.); mickael.germain@usherbrooke.ca (M.G.);
yacine.bouroubi@usherbrooke.ca (Y.B.)
2 XEOS Imaging Inc., Québec, QC G1P 4P5, Canada; tony.stpierre@xeosimaging.com
* Correspondence: etienne.clabaut@usherbrooke.ca

Abstract: Training a deep learning model requires highly variable data to permit reasonable gen-
eralization. If the variability in the data about to be processed is low, the interest in obtaining this
generalization seems limited. Yet, it could prove interesting to specialize the model with respect to a
particular theme. The use of enhanced super-resolution generative adversarial networks (ERSGAN),
a specific type of deep learning architecture, allows the spatial resolution of remote sensing images
to be increased by “hallucinating” non-existent details. In this study, we show that ESRGAN create
better quality images when trained on thematically classified images than when trained on a wide
variety of examples. All things being equal, we further show that the algorithm performs better on
some themes than it does on others. Texture analysis shows that these performances are correlated
with the inverse difference moment and entropy of the images.

 Keywords: super-resolution; ESRGAN; generative adversarial networks; Haralick




Citation: Clabaut, É.; Lemelin, M.;


Germain, M.; Bouroubi, Y.; St-Pierre,
T. Model Specialization for the Use of 1. Introduction
ESRGAN on Satellite and Airborne
Images of high (HR, ~1–5 m per pixel) and very high (VHR, <1 m per pixel) spatial
Imagery. Remote Sens. 2021, 13, 4044.
resolution are of particular importance for several Earth observation (EO) applications,
https://doi.org/10.3390/rs13204044
such as for both visual and automatic information extraction [1–3]. However, currently,
most high-resolution and all very high-resolution images acquired by orbital sensors need
Academic Editor: Tania Stathaki
to be purchased at a high price. On the other hand, there is abundant medium-resolution
imagery currently available for free (e.g., the multispectral instrument onboard Sentinel-2
Received: 3 September 2021
Accepted: 5 October 2021
and the operational land imager onboard Landsat-8). Improving the spatial resolution of
Published: 10 October 2021
medium-resolution imagery to the spatial resolution of high- and very high-resolution
imagery would thus be highly useful in a variety of applications.
Publisher’s Note: MDPI stays neutral
Image resolution enhancement is called super-resolution (SR) and is currently a very
with regard to jurisdictional claims in
active research topic in EO image analysis [4–6] and computer vision in general, as shown
published maps and institutional affil- in [7]. However, SR is inherently an ill-posed problem [8]. Multi-frame super-resolution
iations. (MFSR) uses multiple low-resolution (LR) images to constrain the reconstruction of a
high-resolution (HR) image. However, this approach cannot be used when a single image
is available. Single image super-resolution (SISR) is a particular type of SR that involves
increasing the resolution of a low-resolution (LR) image to create a high-resolution (HR)
Copyright: © 2021 by the authors.
image. SISR can be achieved by (1) the “external example-based” approach, where the
Licensee MDPI, Basel, Switzerland.
algorithm learns from dictionaries [9], or by using (2) convolutional neural networks
This article is an open access article
(CNNs), where the algorithm “learns” the relevant features of the image that would be
distributed under the terms and useful for improving its resolution [10,11]. SISR can also be achieved by using (3) generative
conditions of the Creative Commons adversarial neural networks (GANs) [12]. GANs oppose two networks (a generator and
Attribution (CC BY) license (https:// a discriminator), one against the other, in an adversarial way. The generator is trained to
creativecommons.org/licenses/by/ produce new images to trick the discriminator into trying to distinguish whether it is a real
4.0/). image or a fake image. In this type of network, the generator and the discriminator act

Remote Sens. 2021, 13, 4044. https://doi.org/10.3390/rs13204044 https://www.mdpi.com/journal/remotesensing


Remote Sens. 2021, 13, 4044 2 of 12

as adversaries. GANs provide a powerful framework for generating real-looking images


with high quality, as is the case in [13], through enhanced super-resolution generative
adversarial networks (ESRGAN). This architecture is now used in different applications,
such as satellite imagery [14,15] or the improvement of the predictive resolution of some
models [16,17].
In general, training a neural network model requires separating the dataset into the
following three parts: (1) a training dataset; (2) a validation dataset; and (3) a test dataset.
The training dataset is used to adjust the model parameters and biases. The validation
dataset is used to estimate the model’s skill while tuning its hyperparameters, including
the number of epochs, among others. As only the number of epochs will be relevant for
the reader, its definition is given here. One “epoch” is defined as one forward pass and
one backward pass through the entire training dataset. The test dataset is then used to
verify the capability of the model to generalize its predictions based upon new data. If
the model yields good predictions, then its ability to generalize is good. Otherwise, the
model exhibits overfitting and is susceptible to overlearning, i.e., no further improvement
in performance can be achieved and, indeed, further “tinkering” (experimentation or
adjustment) may possibly result in its subsequent deterioration. Indeed, neural network
models can be over-parameterized, yet can still correctly predict labels, even if these were
assigned randomly [7–9]. Overfitting is a known problem in neural network models and
deep learning in general, but different methods can allow us to avoid the problem, such as
data augmentation by flipping or rotating the images [18,19], the use of a dropout layer that
forces the model to work with part of its parameters turned “off” [20], or early stopping of
the training phase [21,22]. Hence, the sample’s variability in the training dataset is key to
minimize the possibility of overfitting.
In the particular case of SISR, is it relevant to try to maximize the variety of examples
if the model is to be applied to a specific theme or topic? In this study, we use the ESRGAN
method to increase the spatial resolution of different types of imagery and address this
question. ESRGAN were chosen because they outperform, in terms of peak signal-to-
noise ratio (PSNR), other SISR methods, such as SRCNN, EDSR, RCAN, EnhanceNet or
SRGAN [13], and are publicly available. We used airborne and satellite images to construct
different datasets of different themes, as follows: (1) “daily life”; (2) agricultural; (3) forests;
(4) urban areas; (5) rocky outcrops; (6) the planet Mars; and (7) a mixture of different
themes. These groups of data have been used for training with different numbers of epochs
(150, 300, 600, 1200, 2400, 4800). We demonstrated that training a model for a specific task
is more interesting than maximizing the variability in the data during training. The results
indicate that the number of epochs is not strictly correlated with the PSNR value; rather,
it depends upon the topic being trained. Furthermore, this work highlights a correlation
between the quality of the results and the textural homogeneity of the image, i.e., the
inverse difference moment (IDM), together with entropy indices that are taken from the
Haralick co-occurrence matrix [23].

2. Materials and Methods


2.1. ESRGAN Architecture
The ESRGAN architecture we used is inspired by the SRGAN or super-resolution
generative adversarial network [24]. The architecture is described in [13], and we invite the
reader to refer to it for more details. The models are trained to reconstruct images for which
the resolution has been degraded by a factor of 4 using MATLAB bicubic kernel function.
The use of other convolution methods to degrade the resolution is not recommended, given
that they could generate artefacts. The codes that are used here are those provided by
the authors of [13] on their GitHub (https://github.com/xinntao/ESRGAN) (accessed on
14 July 2021).
RemoteSens.
Remote Sens.2021,
2021,13,
13,4044
x FOR PEER REVIEW 33ofof12
12

2.2 Datasets
2.2. Datasets
Images that were
Images that were used
used inin this
this study
study originated
originated from
fromdifferent
differentsources.
sources.EachEachtheme
theme
contains 650 images. The choice of themes was guided by the following
contains 650 images. The choice of themes was guided by the following two main criteria:two main criteria:
(1)constructing
(1) constructingthemes
themesthat
thatpresented
presented a large
a large variability
variability in environments
in environments (forests,
(forests, rego-
regolith,
lith,urban
and and urban
areas,areas,
amongamong others);
others); and (2)and (2)availability.
data data availability.
Table Table
1 lists 1the
lists the datasets
datasets used
in thisinstudy,
used and the
this study, andfollowing section
the following describes
section the different
describes themes
the different thatthat
themes were used
were in
used
this study.
in this study.

Table1.1.Datasets
Table Datasetsused
usedininthis
thisstudy
studywith
withtheir
theirspatial
spatialresolution
resolutionand
andlocation
locationwhen
whenrelevant.
relevant.

Spatial Resolution
Spatial Resolution Location
Location
DIV2K
DIV2K Not relevant
Not relevant Unknown
Unknown
Airborne imagery
Airborne imagery 20 cm20 cm Québec
Québec(Canada)
(Canada)
WorldView
WorldView imagery
imagery 2m2m Axel
Axel HeibergIsland
Heiberg Island (Canada)
(Canada)
HiRISE imagery 25–50 cm Mars
HiRISE imagery 25 – 50 cm Mars

“Daily
“Dailylife”
life”(Figure
(Figure1).1).The
TheDIVerse
DIVerse2K2K(DIV2K)
(DIV2K)resolution high-quality
resolution images
high-quality dataset
images da-
istaset
commonly used in
is commonly the in
used literature to traintoand
the literature then
train andtothen
measure the performance
to measure of super-
the performance of
resolution algorithms
super-resolution [25]. The
algorithms images
[25]. are common
The images scenesscenes
are common from daily life. This
from daily dataset
life. This da-
possesses no spatial
taset possesses resolution
no spatial that isthat
resolution associated with pixel
is associated withsize
pixelofsize
the of
images.
the images.

Figure1.1.Examples
Figure Examples(a,b)
(a,b)of
ofdaily
dailylife
lifeimages
imagesin
inthe
theDIV2K
DIV2Kdataset.
dataset.

“Airborne
“Airborneimagery”
imagery"(Figure
(Figure2).2).These
These images
imageswere
wereacquired at 20
acquired at cm spatial
20 cm resolution
spatial resolu-
and have been kindly provided by XEOS Imaging Inc. (Quebec, QC, Canada).
tion and have been kindly provided by XEOS Imaging Inc. (Quebec, QC). They cover They cover
dif-
different areas, randomly chosen, of the province of Quebec (Canada), between 70 ◦ 290 02”W
ferent areas, randomly chosen, of the province of Quebec (Canada), between 70°29'02” W
and ◦ 530 05”W, and between 40◦ 120 31”N and 48◦ 580 26”N. These images were visually
and7171°53'05” W, and between 40°12'31” N and 48°58'26” N. These images were visually
classified
classifiedinto
intocategories
categoriesaccording
accordingto toland
landuse
use(agricultural,
(agricultural,forest
forestand
andurban).
urban).
“WorldView satellite imagery” (Figure 3). The images were acquired by the sensors
onboard the WorldView-2 (WV-2) and WorldView-3 (WV-3) satellites, which have a spatial
resolution of 2 m. The images cover two geographical areas of Axel Heiberg Island
(Nunavut) in the Canadian High Arctic. They include mostly regolith, rocks, and glaciers.
To be compared with the other themes, only the RGB channels were kept, then converted
to 8 bits images.
“HiRISE satellite imagery” (Figure 4). Forty-nine images that were acquired by the
HiRISE (high-resolution imaging experiment) instrument onboard the Mars Reconnais-
sance Orbiter have been downloaded from the University of Arizona (Tucson, AZ, USA)
website (https://hirise.lpl.arizona.edu) (accessed on 15 April 2021). The images cover
a wide range of geomorphological variability that is found on Mars (dunes, craters and
canyons, among others). The spatial resolution of these images ranges between approxi-
mately 25 cm and 50 cm depending on the orbiter’s altitude.
The “mixed” theme is an equiproportional mixture of images that have been randomly
selected from each of the other six themes.
Remote Sens. 2021, 13, 4044 4 of 12
Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 1

b a c d

Figure 2. Map (a) and examples of airborne imagery. Agriculture (b), forest (c), urban (d). Service
Layer Credits: Esri, Maxar, GeoEye, Earth star Geographics, CNES/Airbus DS, USDA, USGS, Aer-
oGRID, IGN, and the GIS User Community.

"WorldView satellite imagery" (Figure 3). The images were acquired by the sensors
b c d
onboard the WorldView-2 (WV-2) and WorldView-3 (WV-3) satellites, which have a spa-
tial resolution of 2 m. The images cover two geographical areas of Axel Heiberg Island
(Nunavut)
Figure inMap
Figure2.2.Mapthe Canadian
(a)(a)
andand High
examples Arctic.
of airborne
examples They
imagery.
of airborne include mostly
Agriculture
imagery. regolith,
(b), forest
Agriculture rocks,
(b),(c),forest
urban (d).and
(c), glaciers.
Service
urban (d). Servic
Layer
To beLayer Credits:
compared Esri,
Credits: with Maxar, GeoEye,
the other
Esri, Maxar, Earth
themes,
GeoEye, star
only
Earth Geographics,
starthe CNES/Airbus
RGB channels
Geographics, were kept,
CNES/Airbus DS, USDA,
DS,then
USDA,USGS,
converted
USGS, Aer
to 8AeroGRID, IGN,and
bits images.
oGRID, IGN, andthe
theGIS
GIS User
User Community.
Community.

"WorldView satellite imagery" (Figure 3). The images were acquired by the sensor
onboard the WorldView-2 (WV-2) and WorldView-3 (WV-3) satellites, which have a spa
tial resolution of 2 m. The images cover two geographical areas of Axel Heiberg Islan
(Nunavut) in the Canadian High Arctic. They include mostly regolith, rocks, and glaciers
To be compared with the other themes, only the RGB channels were kept, then converte
to 8 bits images.

Figure 3. 3.
Figure MapMap (a)(a)and
andexamples
examples of satelliteimagery
of satellite imageryfrom
from WV-2
WV-2 (b) (b)
andand WV-3
WV-3 (c). Service
(c). Service Layer Layer
Credits: Esri,
Credits: Maxar,
Esri, Maxar,GeoEye,
GeoEye, Earth starGeographics,
Earth star Geographics, CNES/Airbus
CNES/Airbus DS,DS, USDA,
USDA, USGS,USGS, AeroGRID,
AeroGRID,
IGN, and the GIS User Community.
IGN, and the GIS User Community.

Figure 3. Map (a) and examples of satellite imagery from WV-2 (b) and WV-3 (c). Service Laye
sance“HiRISE
Orbiter satellite
have been imagery” (Figure
downloaded 4). the
from Forty-nine images
University that were
of Arizona acquired
(Tucson, AZ)by the
web-
HiRISE
site (high-resolution imaging experiment)
(https://hirise.lpl.arizona.edu). The images instrument onboard
cover a wide thegeomorphological
range of Mars Reconnais-
sance Orbiter have been downloaded from the University of Arizona (Tucson,
variability that is found on Mars (dunes, craters and canyons, among others). The AZ)spatial
web-
site (https://hirise.lpl.arizona.edu). The images cover a wide range of geomorphological
resolution of these images ranges between approximately 25 cm and 50 cm depending on
Remote Sens. 2021, 13, 4044 variability that
the orbiter’s is found on Mars (dunes, craters and canyons, among others). The 5spatial
altitude. of 12
resolution of these images ranges between approximately 25 cm and 50 cm depending on
the orbiter’s altitude.

Figure 4. Examples of HiRISE satellite imagery.

Figure 4. Examples
"mixed"ofof HiRISE satellite imagery.
Figure The
4. Examples theme
HiRISEissatellite
an equiproportional
imagery. mixture of images that have been ran-
domly selected from each of the other six themes.
The
All
All ofof"mixed"
theimages
the theme
images is
ofthe
of thean equiproportional
various
various themes have
themes have mixture
undergoneof images
bicubic that have been(by
convolution
convolution ran-
(by a
domly
a factor
factorof selected from each
of4)4)totodegrade
degrade theirof the
their other sixThe
resolution.
resolution. themes.
The datadatawerewere separated
separated intointo a training
a training set set
(520
(520 All of
images),
images), the imagessetof(65
a validation
a validation the(65
set various
images),
images), themes
and and
a testahave
test undergone
set
set (65 (65 images).
images). bicubic
The The convolution
different
different (by
models
models were a
factor
were of
trained
trained 4) to degrade
to reconstruct
to reconstruct their resolution.
the images
the images at their The data
atoriginal
their originalwere
resolutionseparated
resolution into a training
for 6 different
for 6 different set
epoch
epoch numbers (520
images),
numbers
(150, 300, a validation
(150,600,300, 600,set
1200, (654800),
1200,
2400, images),
2400, i.e.,and
4800), amodels
42i.e.,test set in
42 models (65total.
images).
in total. The different
Training
Training took 19models
took 19 days
days were
on onan
antrained
NVIDIA
NVIDIA toQuadro
reconstruct
QuadroRTX RTX the images
4000
4000 at their
graphics
graphics original
card.
card. For eachresolution for 6 different
reconstruction,
reconstruction, the epoch
thepeak numbers
peaksignal-to-
signal-to-
(150,
noise 300,
noiseratio 600,
ratio(PSNR)
(PSNR)1200,
was
was2400, 4800), to
calculated
calculated i.e., 42 models
toevaluate
evaluate in total.
itsitsquality.
quality. The Training
The model
modelthattook
thatwas19 trained
was days onon
trained an
on
NVIDIA
the greatest Quadro
variety RTX
of 4000
images graphics
was then card.
used onForeacheach of reconstruction,
the themes
the greatest variety of images was then used on each of the themes for 4800 epochs to test for the
4800 peak
epochs signal-to-
to test
noise
whether
whether ratio (PSNR)
a)a)greater was calculated
greaterbenefit
benefit was
wasobtainedto evaluate
obtained its quality.
bybytraining
training a amodel
model The
ononmodel
a awide
wide that
array was
array trained
ofofexamples
examples on
the greatest
ororb)b)specializing variety
specializingon of images
ona asingle
singlethemewas then
themewas used
wasa abetter on each
betteroption. of
option.The the themes
Theworkflow for
workflowthat 4800 epochs to
thatisisdepicted test
depictedinin
whether
Figure
Figure a) greater benefit
5 5summarizes
summarizes the
theentirewasmethodological
entire obtained
methodologicalby training
approach. a model on a wide array of examples
approach.
or b) specializing on a single theme was a better option. The workflow that is depicted in
Figure 5 summarizes the entire methodological approach.

Figure5.5.Methodological
Figure Methodologicalflowchart.
flowchart.

3.Figure
3.Results
5. Methodological flowchart.
Results
The
Theresults
resultsare
arepresented
presentedininthree
threesections.
sections.The
Thefirst
firstsection
sectionprovides
providesexamples
examplesofof
3. Results
image
image resolution improvements. The second section presents the average PSNRvalues
resolution improvements. The second section presents the average PSNR values
that were
The obtained
results for
are each of
presented the
in themes.
three The
sections.third
The section
first provides
section several
provides
that were obtained for each of the themes. The third section provides several textural textural
examples of
indices
image that highlight
resolution correlations The
improvements. between
secondthesection
texturepresents
of the images and the
the average quality
PSNR of
values
the reconstruction.
that were obtained for each of the themes. The third section provides several textural

3.1. Examples of Upscaling Results


Two themes were selected to illustrate our work. All the images were visually dis-
played with the same “minimum–maximum” histogram stretch available in the ArcMap
software. This manner of proceeding could show differences in coloration, due to differ-
ences in the pixel values recovered by the model. Figure 6 displays the results that were
obtained for 150 epochs and 4800 epochs on a Martian talweg, the line of lowest elevation
in a valley.
Two themes were selected to illustrate our work. All the images were visually dis
Two themes were selected to illustrate our work. All the images were visually dis
played with the same "minimum–maximum" histogram stretch available in the ArcMa
played with the same "minimum–maximum" histogram stretch available in the ArcMa
software. This manner of proceeding could show differences in coloration, due to differ
software. This manner of proceeding could show differences in coloration, due to differ
ences in the pixel values recovered by the model. Figure 6 displays the results that wer
ences in the pixel values recovered by the model. Figure 6 displays the results that wer
Remote Sens. 2021, 13, 4044 obtained for 150 epochs and 4800 epochs on a Martian talweg, the line of lowest
6 of 12elevatio
obtained for 150 epochs and 4800 epochs on a Martian talweg, the line of lowest elevatio
in a valley.
in a valley.

Figure 6. Example of results that were obtained for 150 and 4800 epochs of the "Mars" theme.
Figure6.6.Example
Figure Example of results
of results thatthat
werewere obtained
obtained for 150for
and150 and
4800 4800 of
epochs epochs of thetheme.
the “Mars” "Mars" theme.
Figure
Figure 7 is
7 is equivalent
equivalent to Figure
to Figure 6, but6,was
butobtained
was obtained for atheme.
for a forest forest theme.
Figure 7 is equivalent to Figure 6, but was obtained for a forest theme.

Figure7.7.Example
Figure Example of results thatthat
of results werewere
obtained for 150
obtained and
for 4800
150 andepochs on an image
4800 epochs belonging
on an to
image belonging t
Figure
the 7. Example of results that were obtained for 150 and 4800 epochs on an image belonging t
the“forest”
"forest"theme.
theme.
the "forest" theme.
3.2. PSNR Obtained for Each Model
3.2. PSNR Obtained for Each Model
3.2. The
PSNR PSNR was calculated
Obtained for the 65 images of the test set, for each theme and for
for Each Model
The PSNR
each number was calculated
of epochs for the
(150, 300, 600, 652400,
1200, images of the
4800). Thetest set, for
standard each theme
deviation and for eac
was also
The PSNR was calculated for the 65 images of the test set, for each theme and for eac
number oftoepochs
calculated (150, the
characterize 300,dispersion
600, 1200,of2400, 4800). of
the quality Thethestandard
results thatdeviation was also calcu
were obtained.
number ofwas
Each epochs
also (150, 300, 600,with
1200,
the2400, 4800). The trained
standard deviation wasonalso calcu
latedtheme
to characterize reconstructed
the dispersion of model
the that of
quality wasthe results with 4800
that epochs
were obtained. Eac
lated
the to
“mixed”characterize
theme, the
which dispersion
is an of the quality
equiproportional of
mixture the
of results
images that
from were
the obtained.
other six Eac
theme was also reconstructed with the model that was trained with 4800 epochs on th
theme(Figure
themes was also reconstructed
8).which
This highlights with
the the model
influence that was trained
of variability with 4800 epochs on th
"mixed" theme, is an equiproportional mixture ofinimages
the examples on the
from the final
other six theme
"mixed"
quality theme,
of the which is an
reconstructed equiproportional mixture of images from the other six theme
images.
(Figure 8). This highlights the influence of variability in the examples on the final qualit
(Figure 8). This highlights the influence of variability in the examples on the final qualit
of the
3.3. reconstructed
Texture Indices images.
of the reconstructed images.
Since two themes have significantly higher PSNRs than the other five, an image-
texture study was undertaken to try to understand the underlying phenomenon. Four
Haralick texture indices, from the gray level co-occurrence matrix (GLCM), were selected to
determine whether there was a correlation between the ability of the model to reconstruct
the HR images and the intrinsic characteristics of the image’s textures.
RemoteSens.
Remote Sens.2021, 13, x
2021, 13, 4044
FOR PEER REVIEW 7 of 12 7 of 1

Figure8.8.Peak
Figure Peaksignal-to-noise
signal-to-noise ratios
ratios (PSNRs),
(PSNRs), with
with theirtheir associated
associated standard
standard deviations,
deviations, for thefor the di
different themes. For the highest number of epochs, the “mixed” model also tested whether increasingincreasin
ferent themes. For the highest number of epochs, the “mixed” model also tested whether
thevariability
the variability
in in
thethe samples
samples was was relevant.
relevant.

3.3. Figure
Texture9 illustrates
Indices values that had been obtained for these four indices for 20 images
that were randomly selected in each theme, except for the “mixed” scenario, which would
Since two themes have significantly higher PSNRs than the other five, an image
not have added any new information. In each panel, the index values of the original images
textureare
(y-axis) study was
plotted undertaken
against to were
values that try to understand
calculated the
for the underlying
degraded phenomenon.
images (x-axis). Fou
Haralick texture indices, from the gray level co-occurrence matrix (GLCM), were selecte
to determine whether there was a correlation between the ability of the model to recon
struct the HR images and the intrinsic characteristics of the image’s textures.
Figure 9 illustrates values that had been obtained for these four indices for 20 image
that were randomly selected in each theme, except for the "mixed" scenario, which woul
not have added any new information. In each panel, the index values of the original im
ages (Y-axis) are plotted against values that were calculated for the degraded images (X
axis).
Remote Sens. 2021, 13, x FOR PEER REVIEW 8 of 12
Remote Sens. 2021, 13, 4044 8 of 12

Figure 9. Values
Figure 9. Valuesthat
that were
were obtained
obtained forindices
for four four indices that
that were werefrom
derived derived from the
the Haralick Haralick co-occur-
co-occurrence
rence
matrixmatrix for downscaled
for downscaled images
images and and theimages.
the original original images.

4. Discussion
4. Discussion
This section is organized into three parts. First, we acknowledge that the use of
This section
ESRGAN is organized of
for the reconstruction into three(downscaled
images parts. First,bywe acknowledge
a factor that not
of four) does the use of
ESRGAN
correspondfor tothe reconstruction
a real-world of images
application. We (downscaled
then discuss theby PSNR
a factor of four)
values thatdoes
werenot cor-
respond
obtained to
forathe
real-world application.
different themes Wenumbers
and epoch then discuss the PSNR
to understand values
how that were
variability amongobtained
for the different
examples themes
affects the and
quality epoch
of the numbers
results. Finally,towe
understand how
show that the variability
image among exam-
texture indices
are positively
ples affects thecorrelated
quality with
of thetheresults.
ability of ESRGAN
Finally, we to improve
show thattheir resolution.
the image texture indices are
positively correlated with the ability of ESRGAN to improve their resolution.
4.1. Image Resolution Improvement with ESRGAN
As explained in the Methods, the ESRGAN model is trained by “teaching” it to re-
4.1. Image Resolution Improvement with ESRGAN
construct an image, the resolution of which has been degraded. This allows the analyst
As explained
to quantitatively in the Methods,
evaluate the qualitythe ESRGAN
of the model
results by is trained
comparing the by "teaching" ver-
reconstructed it to recon-
struct an image, the resolution of which has been degraded. This allows
sion of the image to the original version. However, this approach does not allow the the analyst to
quantitatively
analyst to judgeevaluate the quality
the real capacity of the
of the results
model by comparing
to create a new image the reconstructed
that has not been version
degraded
of the image by bicubic convolution
to the original beforehand.
version. However,Recent
thisresearch
approach hasdoes
tried not
to overcome
allow thethis
analyst to
difficulty [8,26,27]. However, these new architectures are beyond the scope of
judge the real capacity of the model to create a new image that has not been degraded by our study
and should be the subject of further work.
bicubic convolution beforehand. Recent research has tried to overcome this difficulty
[26,8,27]. However,
4.2. Interest these new
in the Specialization of architectures are beyond the scope of our study and should
Examples in Learning
be thePeak
subject of further ratio
signal-to-noise work.(PSNR) is a measurement that is frequently used in super-
resolution to express the quality of the image reconstruction. In the study that is presented
4.2.
here,Interest in the Specialization
PSNR provides a good idea of
of Examples
the qualityinofLearning
the results.
Figuresignal-to-noise
Peak 8 depicts the variability in the quality
ratio (PSNR) of reconstructions
is a measurement that for high-resolution
is frequently usedim-
in super-
ages; for example, the “Mars” theme attains a maximum PSNR of 39.10 dB
resolution to express the quality of the image reconstruction. In the study that for 4800 epochs,
is presented
while the “forest” theme reaches 30.11 dB for an equivalent number of epochs. As a func-
here, PSNR provides a good idea of the quality of the results.
tion of the number of epochs, the PSNR shows that the learning capacity of the model is
Figure 8 depicts the variability in the quality of reconstructions for high-resolution
not equivalent among the different themes. To illustrate the progression in learning as
images; for example, the “Mars” theme attains a maximum PSNR of 39.10 dB for 4800
epochs, while the “forest” theme reaches 30.11 dB for an equivalent number of epochs. As
a function of the number of epochs, the PSNR shows that the learning capacity of the
model is not equivalent among the different themes. To illustrate the progression in learn-
ing as the number of epoch increases, we averaged the PSNRs that were obtained at 150
Remote Sens. 2021, 13, 4044 9 of 12

the number of epoch increases, we averaged the PSNRs that were obtained at 150 and
300 epochs. The same operation was performed for 2400 and 4800 epochs. This averaging
mitigates the noise surrounding the measurement moving from one group of epochs to the
next. The improvement between the two values is expressed as a percent-age of the first
value. Table 2 summarizes these results.
Table 1 shows that increasing the number of epochs, even by a factor of 32 (=4800/150),
offers improvements to PSNRs (<1%) for three of the seven themes that are treated here
(i.e., agricultural, urban and forestry themes). In contrast, the “Mars” and “mixed” themes
showed strong improvements, reaching 15.72% and 6.86%, respectively. The rock outcrop
and "DIV2K" dataset themes remained below 3%.
The final PSNRs that were obtained for 2400 and 4800 epochs group the different
themes in a similar manner. The “Mars”, “outcrop” and “mixed” themes had good PSNRs,
while the “agriculture”, “urban” and “forest” themes did not reach a value of 31 dB. The
"DIV2K" dataset has a PSNR that is intermediate between the two aforementioned groups.
Interestingly, the use of the widest variety of examples does not generally lead to
better results. With the exception of the Axel Heiberg Island tests, the PSNRs that were
obtained at 4800 epochs are similar to, or lower than, the values that were obtained on a
dedicated training set. The exception of the rocky outcrops, however, should not be taken
as significant, since the improvement is only 1.12% of the value obtained for 4800 epochs
(0.41, in terms of the absolute value). This is all the more negligible, since it is the theme
that offers the greatest standard deviation, with 4.38 or 11.8% of the mean value.

4.3. Texture Indices and Reconstruction of HR Images


All the texture indices that are presented in Figure 9 show that HiRISE and WorldView
data are comparable, as they plot in similar regions of the graphs. Indeed, these datasets
have systematically obtained close values; in the cases of entropy and the inverse difference
moment, they can be clearly distinguished from the other themes. Entropy and the inverse
difference moment, therefore, would appear to be suitable textural indices for explaining
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 12
the ability of ESRGAN to best reconstruct HR images. Figure 10 shows the PSNR values as
a function of these indices.

Figure10.
Figure 10.Mean
MeanPSNR
PSNRvalues
values(error
(errorbars
barsare
areSDs)
SDs)plotted
plottedas
asaafunction
functionofofmean
mean(±
(±SD)
SD)texture
textureindex
index
values for the inverse difference moment and entropy.
values for the inverse difference moment and entropy.

Theinverse
The inversedifference
differencemoment
momentmeasures measuresthethelocal
localhomogeneity
homogeneityofofthe theimage.
image.The The
greater
greaterthethevalue,
value,the
thegreater
greaterthethehomogeneity
homogeneityis. is.Entropy
Entropymeasures
measuresthe thedegree
degreeof ofdisorder
disorder
in
in the
the image. The lower
image. The lowerthethevalue,
value,the thegreater
greaterthe
theorder
orderininthe
the texture
texture is.is. Thus,
Thus, it isitnot
is
not surprising that the best performances of ESRGAN are obtained
surprising that the best performances of ESRGAN are obtained for images with high ho-for images with high
homogeneity
mogeneity and andlowlow entropy;
entropy; the
the reconstruction
reconstruction ofofthe
theimage
imageatatitsitsoriginal
originalresolution
resolutionis
ismore
morepredictable.
predictable.TheThe model
model is isless
lessperturbed
perturbedbybystatistic
statisticvariations,
variations, therandomness
the randomness of
of which
which would
would prevent
prevent prediction.
prediction. ThisThis explains
explains whybest
why the thePSNRs
best PSNRs were obtained
were obtained for two
for two themes,
similar similar themes, i.e., and
i.e., Martian MartianArctic and Arctic which
regolith, regolith,
arewhich
indeedare muchindeed
moremuch more
homogene-
homogeneous than other themes, despite having unfavourable
ous than other themes, despite having unfavourable signal-to-noise ratios. signal-to-noise ratios.

5. Conclusions
For this study, ESRGAN were used to increase the spatial resolution of the following
different themes: (1) "daily life"; 2) agricultural; 3) forests; 4) urban areas; 5) rocky out-
Remote Sens. 2021, 13, 4044 10 of 12

Table 2. Average PSNRs were obtained for 150 and 300 epochs, and then for 2400 and 4800 epochs.
The increase in these average PSNRs is expressed as a percentage of the values in the first column.
The colours highlight groups of values of the same order of magnitude.

Averaged PSNR in dB for Averaged PSNR in dB for Enhancement


150 and 300 Epochs 2400 and 4800 Epochs %
Mars theme 32.95 38.13 15.72%
Outcrops theme 35.69 36.68 2.77%
Daily life theme 30.92 31.63 2.29%
Crops theme 30.47 30.59 0.39%
Urban theme 29.16 30.09 0.87%
Forest theme 29.83 30.00 0.57%
Mixed theme 30.91 33.03 6.86%

5. Conclusions
For this study, ESRGAN were used to increase the spatial resolution of the following
different themes: (1) “daily life”; (2) agricultural; (3) forests; (4) urban areas; (5) rocky
outcrops; (6) the planet Mars; and (7) a mixture of different themes. Our aim was to
verify whether it is advantageous to maximize the variability in the examples during the
training phase, or if it is preferable to provide a specialized model. Moreover, training
was performed for six different levels of epochs (150, 300, 600, 1200, 2400, and 4800) to
validate whether it is judicious to always maximize the learning time. Finally, texture
indices were used to explain the variability in the quality of the results that were obtained.
The conclusions of this work are as follows:
• It is more beneficial to create a specialized ESRGAN model for a specific task, rather
than trying to maximize the variability in examples.
• The ability to learn depends upon the subject matter. No recommendations can be
made a priori.
• ESRGAN perform better on images with a high inverse difference moment and low
entropy indices.

Author Contributions: Conceptualization, É.C., M.L., M.G. and Y.B.; methodology, É.C.; validation,
É.C., M.L., M.G., Y.B. and T.S.-P.; formal analysis, É.C.; investigation, É.C.; resources, É.C., M.L., M.G.,
Y.B. and T.S.-P.; data curation, É.C. and T.S.-P.; writing—original draft preparation, É.C.; writing—
review and editing, É.C., M.L., M.G., Y.B. and T.S.-P.; visualization, É.C.; supervision, M.L., M.G. and
Y.B.; project administration, M.L.; funding acquisition, M.L. All authors have read and agreed to the
published version of the manuscript.
Funding: This research was undertaken, in part, thanks to funding from the Canada Research Chair
in Northern and Planetary Geological Remote Sensing (950-232175), an NSERC Discovery grant held
by Myriam Lemelin and a Mitacs Accelerate grant (IT17517) held by Mickaël Germain.
Data Availability Statement: The codes used in this paper are available on Github at the following ad-
dress: https://github.com/ettelephonne/ESRGAN-Haralick (accessed on 14 July 2021). Some of our data
and results are available following this link: https://usherbrooke-my.sharepoint.com/:u:/g/personal/
clae2101_usherbrooke_ca/ESpyvfn7rUlFilBMMTO9Q00Bhi6I-1M-yz1Z-PsSR8wi4Q?e=sOmml1 (accessed
on 14 July 2021). The data provided by XEOS Imagery Inc. are not publicly available.
Acknowledgments: We would like to thank XEOS Imaging Inc. for letting us use some of their data.
We also thank the University of Arizona for allowing us to freely download images that were acquired
by the HiRISE instrument. We finally would like to thank W.F.J. Parsons for language revisions.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or
in the decision to publish the results.
Remote Sens. 2021, 13, 4044 11 of 12

References
1. Transon, J.; d’Andrimont, R.; Maugnard, A.; Defourny, P. Survey of Hyperspectral Earth Observation Applications from Space in
the Sentinel-2 Context. Remote Sens. 2018, 10, 157. [CrossRef]
2. Collin, A.M.; Andel, M.; James, D.; Claudet, J. The Superspectral/Hyperspatial Worldview-3 as the Link Between Spaceborne
Hyperspectral and Airborne Hyperspatial Sensors: The Case Study of the Complex Tropical Coast. Int. Arch. Photogramm. Remote
Sens. Spatial Inf. Sci. 2019, XLII-2/W13, 1849–1854. [CrossRef]
3. Pandey, P.C.; Koutsias, N.; Petropoulos, G.P.; Srivastava, P.K.; Ben Dor, E. Land Use/Land Cover in View of Earth Observation:
Data Sources, Input Dimensions, and Classifiers—a Review of the State of the Art. Geocarto Int. 2021, 36, 957–988. [CrossRef]
4. Haut, J.M.; Fernandez-Beltran, R.; Paoletti, M.E.; Plaza, J.; Plaza, A. Remote Sensing Image Superresolution Using Deep Residual
Channel Attention. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9277–9289. [CrossRef]
5. Märtens, M.; Izzo, D.; Krzic, A.; Cox, D. Super-Resolution of PROBA-V Images Using Convolutional Neural Networks. Astrodyn
2019, 3, 387–402. [CrossRef]
6. Deudon, M.; Kalaitzis, A.; Goytom, I.; Arefin, M.R.; Lin, Z.; Sankaran, K.; Michalski, V.; Kahou, S.E.; Cornebise, J.; Bengio, Y.
HighRes-Net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery. arXiv 2020, arXiv:2002.06460.
7. Lugmayr, A.; Danelljan, M.; Timofte, R. NTIRE 2021 Learning the Super-Resolution Space Challenge. In Proceedings of the 2021
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021;
pp. 596–612.
8. Lugmayr, A.; Danelljan, M.; Van Gool, L.; Timofte, R. SRFlow: Learning the Super-Resolution Space with Normalizing Flow.
In Proceedings of the Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International
Publishing: Cham, Switzerland, 2020; pp. 715–732.
9. Ayas, S. Single Image Super Resolution Using Dictionary Learning and Sparse Coding with Multi-Scale and Multi-Directional
Gabor Feature Representation. Inf. Sci. 2020, 512, 1264–1278. [CrossRef]
10. Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a Deep Convolutional Network for Image Super-Resolution. In Proceedings of the
European Conference on Computer Vision, Zurich, Switzerland, 6–14 September 2014; p. 16.
11. Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; IEEE:
Manhattan, NY, USA, 2016; pp. 1646–1654.
12. Yang, W.; Zhang, X.; Tian, Y.; Wang, W.; Xue, J.-H.; Liao, Q. Deep Learning for Single Image Super-Resolution: A Brief Review.
IEEE Trans. Multimedia 2019, 21, 3106–3121. [CrossRef]
13. Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial
Networks. In Computer Vision—ECCV 2018 Workshops; Leal-Taixé, L., Roth, S., Eds.; Lecture Notes in Computer Science; Springer
International Publishing: Cham, Switzerland, 2019; Volume 11133, pp. 63–79, ISBN 978-3-030-11020-8.
14. Romero, L.S.; Marcello, J.; Vilaplana, V. Super-Resolution of Sentinel-2 Imagery Using Generative Adversarial Networks. Remote
Sens. 2020, 12, 2424. [CrossRef]
15. Clabaut, É.; Lemelin, M.; Germain, M. Generation of Simulated “Ultra-High Resolution” HiRISE Imagery Using Enhanced
Super-Resolution Generative Adversarial Network Modeling. Online. 15 March 2021, p. 2. Available online: https://www.hou.
usra.edu/meetings/lpsc2021/pdf/1935.pdf (accessed on 6 October 2021).
16. Watson, C.D.; Wang, C.; Lynar, T.; Weldemariam, K. Investigating Two Super-Resolution Methods for Downscaling Precipitation:
ESRGAN and CAR. arXiv 2020, arXiv:2012.01233.
17. Wu, Z.; Ma, P. Esrgan-Based Dem Super-Resolution for Enhanced Slope Deformation Monitoring In Lantau Island Of Hong Kong.
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2020, XLIII-B3-2020, 351–356. [CrossRef]
18. Perez, L.; Wang, J. The Effectiveness of Data Augmentation in Image Classification Using Deep Learning. arXiv 2017,
arXiv:1712.04621.
19. Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [CrossRef]
20. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks
from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
21. Raskutti, G.; Wainwright, M.J.; Yu, B. Early Stopping and Non-Parametric Regression: An Optimal Data-Dependent Stopping
Rule. J. Mach. Learn. Res. 2014, 15, 335–366.
22. Li, M.; Soltanolkotabi, M.; Oymak, S. Gradient Descent with Early Stopping Is Provably Robust to Label Noise for Overparame-
terized Neural Networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Palermo, Sicily,
Italy, 3–5 June 2020.
23. Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973,
SMC-3, 610–621.
24. Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al.
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. arXiv 2017, arXiv:1609.04802.
25. Agustsson, E.; Timofte, R. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In Proceedings of the
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017;
pp. 1122–1131.
Remote Sens. 2021, 13, 4044 12 of 12

26. Lugmayr, A.; Danelljan, M.; Timofte, R. Unsupervised Learning for Real-World Super-Resolution. arXiv 2019, arXiv:1909.09629.
27. Umer, R.M.; Foresti, G.L.; Micheloni, C. Deep Generative Adversarial Residual Convolutional Networks for Real-World Super-
Resolution. arXiv 2020, arXiv:2005.00953.

You might also like