Few-shot satellite image classification for bringing deep learning on board OPS-SAT

Expert Systems With Applications 251 (2024) 123984
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
Few-shot satellite image classification for bringing deep learning on board

OPS-SAT
Ramez Shendy, Jakub Nalepa ∗
Silesian University of Technology, Akademicka 16, Gliwice, 44-100, Poland
ARTICLE INFO ABSTRACT
Keywords: Bringing artificial intelligence on board Earth observation satellites unlocks unprecedented possibilities to
Satellite image classification extract actionable items from various image modalities at the global scale in real time. This is of paramount
Data augmentation importance nowadays, as downlinking large amounts of imagery is not only prohibitively expensive but also
Synthetic data generation
time-consuming. However, building deep learning solutions that could be deployed on board an edge device
Ensemble learning
is challenging due to the limited manually-annotated satellite datasets and hardware constraints of an edge
Deep learning
Few-shot classification
device. This paper addresses these challenges through harnessing a blend of data-centric and model-centric
approaches to build a well-generalizing yet efficient and resource-frugal deep learning model for multi-class
satellite image classification in the few-shot learning settings. This integrated strategy is formulated to enhance
classification performance while accommodating the unique demands of an image analysis chain on board
OPS-SAT, a nanosatellite operated by the European Space Agency. The experiments performed over a real-
world dataset of OPS-SAT images delves into the interactions between data- and model-centric techniques,
underscores the significance of synthesizing artificial training data and emphasizes the value of ensemble
learning. However, they also caution against negative transfer in domain adaptation. This study sheds light
on effective model training strategies and highlights the multifaceted challenges inherent in deep learning
for practical Earth observation, contributing insights to the field of satellite image classification within the
constraints of nanosatellite operations. To ensure reproducibility of our study, the implementation is available
at https://github.com/ShendoxParadox/Few-shot-satellite-image-classification-OPS-SAT.
1. Introduction the field of computer vision, achieving remarkable results in various

domains (LeCun, Bengio, & Hinton, 2015). Deep learning models, par-
The scarcity of training examples has remained a persistent chal- ticularly convolutional neural networks (CNNs) (Lecun, Bottou, Bengio,
lenge in deep learning (Bansal, Sharma, & Kathuria, 2022). This issue & Haffner, 1998), have demonstrated an exceptional ability to learn
becomes particularly pronounced in tasks such as satellite image clas- complex representations directly from raw image data, eliminating
sification, where obtaining large labeled datasets is often prohibitively the need for building handcrafted feature extractors. These models
expensive and time-consuming due to the need for domain expertise are capable of automatically extracting hierarchical features (Li &
and manual annotation (Derksen et al., 2021; Nalepa et al., 2021). Yu, 2015), enabling them to capture intricate patterns and variations
Limited training examples can lead to overfitting, hindering model gen- in images. Notable advancements, such as the AlexNet (Krizhevsky,
eralization to unseen instances (Domingos, 2012). Therefore, limited
Sutskever, & Hinton, 2012), Visual Geometry Group (VGGNet) (Si-
and imbalanced datasets are prevalent challenges in satellite image
monyan & Zisserman, 2015), Residual Networks (ResNets) (He, Zhang,
classification (Mellor, Boukir, Haywood, & Jones, 2015), impeding the
Ren & Sun, 2016) and many others, including recent foundation mod-
development of accurate and generalizable models. Also, satellite image
els, have propelled the accuracy and robustness of image classification
datasets often exhibit class imbalances due to the varying occurrences
models to unprecedented levels. With large-scale labeled datasets, like
of land cover categories within a given geographic region (Bischke,
Helber, Borth, & Dengel, 2018). ImageNet (Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009), these models
Image classification, which is an important application in computer have been trained on millions of images, allowing them to recognize a
vision, consists in classifying distinct types of pictures based on their wide range of objects with high precision. The success of deep learning
content. Image classification using deep learning has revolutionized in image classification has paved the way for numerous applications,
∗ Corresponding author.
E-mail addresses: ra304822@student.polsl.pl (R. Shendy), Jakub.Nalepa@polsl.pl (J. Nalepa).
https://doi.org/10.1016/j.eswa.2024.123984
Received 25 January 2024; Received in revised form 6 April 2024; Accepted 12 April 2024
Available online 24 April 2024
0957-4174/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
R. Shendy and J. Nalepa Expert Systems With Applications 251 (2024) 123984
including autonomous driving (Grigorescu, Trasnea, Cocias, & Mace- • We make our approaches fully-reproducible, and make the im-
sanu, 2019), medical imaging (Lee et al., 2017), and satellite image plementation available at https://github.com/ShendoxParadox/
classification (Pritt & Chern, 2017), making significant contributions Few-shot-satellite-image-classification-OPS-SAT.
to various fields.
We believe that our efforts may be an important step toward ad-
Implementing deep models on board Earth observation (EO) satel-
vancing the capabilities of remote sensing and satellite-based applica-
lites offers enormous spatial scalability of such solutions but also in-
tions, where capturing large ground-truth datasets is infeasible. Also,
troduces additional technological hurdles, particularly regarding their
since our techniques are model-agnostic, we encourage the commu-
memory usage and meeting energy consumption demands (Nalepa nity to build upon our findings, and develop well-generalizing and
et al., 2021). In the case of on-board processing, there is an extreme sce- compact deep learning models that could be deployed on board edge
nario where hardware constraints severely limit energy consumption, devices. These algorithms will be directly comparable with our meth-
memory capacity, and communication capabilities with the ground (Fu- ods over the same training-test dataset split available through the
rano et al., 2020). Hardware constraints faced recent space missions, OPS-SAT Case challenge. This is of paramount importance, as we are
such as 𝛷-Sat −1 (Mateo-Garcia, Oprea, Smith, Veitch-Michaelis, Bay- facing the ‘‘leakage and reproducibility crisis’’ in the machine learning
din, & Backes, 2019) and OPS-SAT (Operations nanoSatellite) (Evans research (Kapoor & Narayanan, 2023).
& Merri, 2014), which have showcased the capability of conducting
on-board classification. Additionally, it was observed that many ar- 1.2. Structure of the article
tificial intelligence (AI) models often exhibit reduced accuracy when
confronted with data that significantly differs from the training data This article is structured as follows. Section 2 reviews the state of the
(Krueger et al., 2021). The integration of AI into spaceborne plat- are in few-show image classification. In Section 3, we present the OPS-
forms in the EO field could unlock a wealth of advantages. They SAT Case challenge in detail, together with the datasets used in this
could generate real-time notifications concerning catastrophic events, study, as well as our data- and model-driven techniques used for few-
for instance, floods (Mateo-Garcia, Veitch-Michaelis, Smith, Oprea, show satellite image classification. The experiments are discussed in
Schumann, Gal, Baydin, & Backes, 2021), wildfires (Del Rosso, Sebas- Section 4. In Section 5, we contextualize our techniques within the OPS-
tianelli, Spiller, Mathieu, & Ullo, 2021), oil spills (Diana, Xu, & Fanucci, SAT Case challenge, and confront it with the best-performing solutions.
2021), and more (Solari et al., 2020; Wijata et al., 2023). Moreover, Section 6 concludes the paper and discusses some interesting research
there is a potential to observe unlawful activities such as deforesta- avenues which may be explored based on our results.
tion (Irvin et al., 2020; Slough, Kopas, & Urpelainen, 2021), illegal
fishing (Kurekin et al., 2019), and marine debris dumping (Biermann, 2. Related literature
Clewley, Martinez-Vicente, & Topouzelis, 2020; Topouzelis, Papageor-
giou, Suaria, & Aliani, 2021), thus augmenting the EO technology to Supervised machine learning (ML) algorithms comprise various
tackle significant environmental and security issues. In EO, captured methods which elaborate predictions based on a model trained over
images may contain undesirable elements including clouds or sensor- a finite training sample (Mohri, Rostamizadeh, & Talwalkar, 2018;
level noise, and identifying and discarding such images is crucial to Razzaghi, Abbasi, & Ghasemi, 2023). A relatively new ML approach,
accelerate on-board processing (Furano et al., 2020; Giuffrida et al., which resembles the human learning process and allows rapid general-
2020). Overall, the demand for on-board AI has grown significantly ization from very limited sample data is few-shot learning (FSL) (Wang,
in recent years. However, deploying AI in real-world applications still Yao, Kwok, & Ni, 2020). Wang et al. conducted an extensive literature
review on FSL, organizing the research into a unified framework
poses several challenges, including data availability and the constraints
that encompasses the data, model, and algorithm perspectives (Wang
of low energy and memory budgets. These issues are addressed in this
et al., 2020). Similarly, Zhang, Wang, Tang, Zhou, Lu and Xuxiang
work.
(2022) provided an overview of the advancements in FSL, focusing
on approaches involving model fine-tuning, data augmentation, and
1.1. Contribution transfer learning. However, there is a scarcity of literature that reviews
the topic of few-shot image classification (Lu, Gong, Ye, Zhang, &
We address the challenges of deploying deep machine learning on Zhang, 2023).
board OPS-SAT, an experimental satellite operated by the European Training large-capacity deep ML models (Palhamkhani, Alipour,
Space Agency (ESA). We tackle an important issue of building an Dehnad, Abbasi, Razzaghi, & Ghasemi, 2023) commonly requires a
efficient and resource-frugal multi-class deep learning classifier from huge amount of representative data. However, such large data sam-
extremely limited training sets, and focus on constructing effective ples are difficult and costly to obtain in practical domains such as
solutions by leveraging a combination of data-centric and model-based health (Bansal et al., 2022) and satellite image analysis (Derksen et al.,
approaches for few-shot image classification. Our contributions lie 2021; Yeung, Zhou, Chung, Moule, Thompson, Ouyang, Cai, & Ben-
in proposing the methodologies that enhance the classification per- namoun, 2022). Also, sample labeling is laborious and time-consuming,
formance through data-centric techniques such as data augmentation and may be human-dependent, hence non-reproducible. Therefore, in
and synthetic data generation, and combining them with model-level real-world situations, there are frequently insufficient training sets,
approaches, such as transfer learning. These contributions thus revolve which can result in over-fitting deep neural networks, effectively mak-
around the following points: ing them memorize the training set (thus, they are unable to gener-
alize). This problem may be addressed using FSL (Fei-Fei, Fergus, &
• We combine data-centric (Section 3.3) and model-based (Sec- Perona, 2006)—by leveraging a small amount of data, FSL enables the
tion 4) approaches for few-shot classification of satellite images, application of these models, expanding their usability. Here, the essen-
and establish a comprehensive framework that improves few-shot tial challenge of FSL is how to utilize previously acquired information
satellite image classification. when learning a new category (Fe-Fei, Fergus, & Perona, 2003). This
• We build upon a convolutional architecture (EfficientNetLite (Tan approach has become commonly used in a variety of image processing
& Le, 2019)) suggested by the organizers of ‘‘The OPS-SAT Case’’ applications, including image recognition and classification (Liu et al.,
challenge (Derksen et al., 2021)—this architecture fits the hard- 2019; Qiao, Liu, Shen, & Yuille, 2018).
ware constraints imposed by OPS-SAT (Section 3.1). Few-shot image classification (FSIC) algorithms can be classified
• We thoroughly investigate the proposed methods over a real- into four main categories: data augmentation-, transfer learning-, meta-
world dataset of OPS-SAT satellite images (Section 4). learning-, and multimodal-based methods (Liu et al., 2022) (Fig. 1).
2
Abbeel, 2020; Sohl-Dickstein, Weiss, Maheswaranathan, & Ganguli,

2015; Song & Ermon, 2020) push the boundaries of image and video
generation (Dhariwal & Nichol, 2021; Ho et al., 2022; Ho & Salimans,
2021). DALL-E 2 (Ramesh, Dhariwal, Nichol, Chu, & Chen, 2022), a
breakthrough method in text-conditional image generation, combines
the aforementioned approaches, and can be used for synthesizing new
data. It integrates a diffusion decoder with the CLIP image encoder for
training. The diffusion decoder serves as a non-deterministic inverter
that can generate multiple images corresponding to a given image
embedding. Similar to the GAN inversion techniques (Xia et al., 2023;
Zhu, Krähenbühl, Shechtman, & Efros, 2016), encoding and decoding
an input image result in semantically similar output images. Moreover,
DALL-E 2 has the ability to interpolate between input images by
inverting interpolations of their image embeddings.
Transfer learning methods leverage knowledge gained from a spe-
cific field or task and apply it to different, yet related problems.
Depending on the underlying mechanisms, transfer learning meth-
ods can be further divided into instance-based, feature-based, and
fine-tuning-based approaches. Transfer learning involves utilizing a
pre-trained model from a related source task to tackle a target task
with limited training data. Deep transfer learning methods take ad-
vantage of the capability of deep models to learn shared features
across tasks (Oquab, Bottou, Laptev, & Sivic, 2014; Yosinski, Clune,
Bengio, & Lipson, 2014), particularly in shallower layers. The selection
of the most suitable pre-trained models is crucial for achieving high-
quality performance in the target task (Kadam & Vaidya, 2020). Of
Fig. 1. Taxonomy of few-shot image classification algorithms with the methods note, in some cases, a direct transfer of knowledge may fail if the
adopted in this work annotated in bold.
Source: This figure is inspired by Liu et al. (2022).
source and target domains are unrelated, as the negative transfer can
occur (Yosinski et al., 2014).
Meta-learning (Finn, Abbeel, & Levine, 2017) uses prior knowledge
and experience to guide the learning process for new tasks, enabling
Considering practical constraints, such as the fixation of the model
the model to acquire the ability to learn how to learn. Based on
architecture for specific hardware demands, or tackling the challenge of
the underlying mechanisms, meta-learning methods can be divided
a limited number of images per class in various scenarios, the emphasis
into model-, optimization-, and metric-based techniques (Vilalta &
is put on the data augmentation and transfer learning in the context
Drissi, 2002). Meta-learning methods have emerged as a promising
of FSIC algorithms. These two categories have been prioritized to
approach to address the challenges of FSL—a comprehensive review
address the limitations and constraints imposed by OPS-SAT, allowing by Vanschoren (2018) provides insights into the evolving landscape
for the effective utilization of available data and leveraging pre-existing of meta-learning-based FSL. The multimodal-based methods improve
knowledge. the quality of feature representations by removing unnecessary overlap
Data augmentation methods tackle the challenge of limited training between different modalities and leveraging the unique strengths of
data by expanding the training set through various transformations of each modality (Liu et al., 2022). These algorithms integrate different
existing data. Data generation methods involve the creation of addi- sources of information, such as text, images, and audio to improve
tional training examples (Cubuk, Zoph, Mane, Vasudevan, & Le, 2018; the model’s ability to learn from limited examples. The comprehensive
He, Zhang, Ren & Sun, 2016; Redmon & Farhadi, 2018), effectively in- overview by Baltrušaitis, Ahuja, and Morency (2019) discusses the
creasing the diversity of the training set. Feature enhancement methods potential of multimodal learning in FSL.
focus on altering image attributes, such as color, contrast, and bright-
ness (Aleem, Kumar, Little, Bendechache, Brennan, & McGuinness, 3. Materials and methods
2022; Zhang, Kinoshita, & Kiya, 2020). Collectively, data augmentation
strategies contribute to ensuring that the models can handle variability In Section 3.1, we present the OPS-SAT Case—a competition for on-
of real-world imagery (Perez & Wang, 2017; Shorten & Khoshgoftaar, board FSIC. The datasets used in our transfer learning are discussed
2019). Unlike the traditional training-time augmentation, it is possible in Section 3.2. We exploit two distinct yet complementary approaches
to execute it during inference to benefit from the ensemble-like tech- to tackle the task of FSIC: data-driven (Section 3.3) and model-centric
niques (Nalepa, Myller, & Kawulok, 2020). Zhang, Che, Ghahramani, methodologies (Section 3.4).
Bengio, and Song (2018) introduced MetaGAN, which incorporates
generative adversarial networks (GANs) (Goodfellow et al., 2014) with 3.1. The OPS-SAT case: Challenge and dataset
a segment of the classification network during training. It enhances
the performance of FSL by enabling the few-shot classifier to acquire The OPS-SAT Case, a data-centric competition, was proposed to
a more distinct decision boundary. As the field of FSL has advanced, accelerate the deployment of on-board classification for EO satellites
more sophisticated data augmentation techniques have been introduced using AI on the edge with a focus on low-memory and low-energy
to further enhance performance of the models trained from extremely environments that could be trained from extremely limited ground-
limited training sets (Liu et al., 2022). truth datasets. The focus is on using a fixed lightweight deep-learning
Contrastive Language-Image Pre-Training (CLIP) neural networks model (EfficientNetLite (Tan & Le, 2019)) that is well-suited for the
(Radford et al., 2021) are an effective representation learning method limited computational and memory resources available on OPS-SAT,
for images (Shen et al., 2021). CLIP embeddings exhibit robustness and on overcoming the constraints associated with the scarcity of
in the face of variations in image distribution and possess impressive labeled data and the satellite’s restricted computing capabilities. Of
zero-shot capabilities. Simultaneously, diffusion models (Ho, Jain, & note, this deep learning architecture could not be modified or replaced
3
Fig. 2. Timeline of deploying on-board AI methods, representing stages performed on board a satellite (annotated in blue: data collection and mission) as well as those performed
on the ground (in green: data simulation and annotation).
Source: This figure is inspired by Derksen et al. (2021).
by the participants. The primary objectives of the competition are to

mitigate the requirement for substantial volumes of in-situ data for
training AI algorithms, and to minimize the number of pre-processing
tasks executed on board OPS-SAT for performing image classification.
Thus, the competition presented a challenge that prioritized data-
centric approaches to tackle significant problems of limited training
data availability and developing compact deep models for on-board
deployment.
Fig. 2 depicts the data simulation, collection and annotation, which
can be resource-intensive and time-consuming, particularly when sub-
stantial amounts of labeled data are needed to attain adequate accu-
racy. Furthermore, satellite images typically undergo pre-processing
that involves sensor calibration and atmospheric corrections to enhance
the consistency between training and real-world images. However, ex-
ecuting these steps on board introduces computational burden, thereby
restricting the mission time available for other tasks.
The OPS-SAT Case dataset (Agency, 2023) comprises 26 raw and
unprocessed Earth RGB images captured by the OPS-SAT cube-sat’s
on-board camera. These images have a resolution of approximately
2048 × 1944. The objective of the competition is to optimize the net-
work parameters for an on-board classification task using the
EfficientNetLite-B0 model (Tan & Le, 2019), having only 10 labeled
images for each of the 8 target classes. The labeled patches provided
for training are 200 × 200 pixels in size, and the classes correspond
to various types of landcover/content: Agriculture, Cloud, Mountain,
Natural, River, Sea Ice, Snow, and Water (Derksen, Meoni, Lecuyer,
Mergy, Märtens, & Izzo, 2022). Each image is composed of ca. 100
tiles, hence multiple classes may be present within one image (every
individual tile is associated with exactly one class label). The public
data could be used by participants in any way (Agency, 2023; Derksen
et al., 2022). Fig. 3 shows the image tiles used as the training ones in
this study.
3.2. Datasets used in transfer learning
We exploited three datasets in this study: ImageNet (Deng et al.,

2009), Landuse (Yang & Newsam, 2010), and Opensurfaces (Bell,
Upchurch, Snavely, & Bala, 2014). Transfer learning with ImageNet
has become pivotal in the field of deep learning for computer vision
tasks (Kornblith, Shlens, & Le, 2019; Krizhevsky et al., 2012). In Bell
et al. (2014), the Materials in Context Database (Opensurfaces) dataset
was introduced, and it encompasses 23 categories of materials and
objects. The Landuse dataset comprises images of 21 classes extracted
from aerial orthoimagery, sourced from various regions in the United
States. For each of the 21 classes (including agricultural and urban
objects), 100 images (256 × 256 pixels) were manually selected. The
classes exhibit a variety of spatial patterns, some with homogeneity
in texture or color, while others show more complex and diverse
features (Yang & Newsam, 2010). To see the example images from all
the datasets, see the supplementary materials.
Fig. 3. Sample of the eight category training images of the OPS-SAT dataset.
3.3. Data-driven approaches for FSIC
We introduce several data-level methodologies to learn from limited

training samples. We focus on data augmentation, synthetic image gen- 3.3.1. Data augmentation
eration, and the color correction employed to rectify synthetic images.
Various data augmentation techniques were employed in our study
It is important to emphasize that such data-level methodologies meet
the goal of lightweight models and fast on-board computation, as they to enhance the dataset’s diversity and improve the model’s abilities.
are the training-time techniques. They included:
4
Fig. 4. Differences between the (a) original images of selected classes, and their (b) modifications after applying all of the used data augmentation techniques in a sequential
order: rotation, flipping, random brightness/contrast adjustment and additive Gaussian noise.
• Rotation by angles of 90, 180, and 270 degrees to make Olaniyi, & Huang, 2022). We exploit DALL-E 2 to produce synthetic
the model more robust to variations in object orientation and images using a reference image (DALL-E 2 has been shown effective
viewpoint. in agricultural applications (Sapkota, 2023)). Notably, the utilization
• Vertical flipping. It is particularly beneficial when dealing with of synthetically generated images for data augmentation, especially
symmetrical objects or scenes that can be viewed from multiple with the use of DALL-E 2, remains a relatively scarce topic in existing
angles (He, Zhang, Ren & Sun, 2016; Simonyan & Zisserman, literature. Fig. 5 displays the original images used to create other
2015). synthetic ones. The color variation between the initial and generated
• Random adjustments of brightness and contrast. By randomly image can be mitigated with the use of color correction algorithms.
adjusting the brightness and contrast of training images, the
model may become more resilient to variations in lighting and 3.3.3. Color correction
pixel intensities. It is valuable if the images may be captured The goal of color correction is to align the color appearance of
under different lighting conditions (Nalepa et al., 2021), ensuring the target image with that of the reference one (Reinhard, Adhikhmin,
the model’s adaptability to real-world situations (Aleem et al., Gooch, & Shirley, 2001), ensuring visual consistency. Numerous meth-
2022; Zhang et al., 2020). ods have been proposed to achieve color correction, ranging from
• Contaminating training set with additive Gaussian noise. traditional histogram matching techniques (Reinhard et al., 2001) to
This noise introduces subtle fluctuations in image data, which (deep) machine learning approaches (Xiao & Ma, 2009; Zhang, Isola,
can mimic real-world imperfections and environmental factors & Efros, 2016; Zhang, Liao & Yu, 2022). These techniques play a
that affect the acquisition (Arslan, Guzel, Demirci, & Ozdemir, crucial role in enhancing image quality, facilitating image analysis,
2019; Dodge & Karam, 2017). The Gaussian distribution is widely and enabling reliable image-based applications across various domains.
accepted here due to its prevalence in natural phenomena (Ras- The color correction algorithm adopted in this study (which may be
mussen, Williams, et al., 2006). The probability density function considered the color transfer) aims to match the color statistics of the
of the Gaussian distribution becomes: input image to those of the reference image. This is done by aligning
( )
1 1 the mean and standard deviation of the LAB color space (Murali &
𝑁(𝜒 ∶ 𝜇, 𝜎) = √ exp − (𝑥 − 𝜇)2 ∕𝜎 2 , (1)
2𝜋𝜎 2 2 Govindan, 2013) of the two images through executing the following
steps:
where 𝜇 represents the mean of the Gaussian distribution, and 𝜎
is the standard deviation, thus the random variable 𝜒 is sampled 1. Convert the images to the LAB color space, separating the lu-
from this distribution and injected into the contaminated pixel. To minance (L) channel from the chrominance (A and B) channels,
account for real-world noise variations, additive Gaussian noise therefore making it more convenient to manipulate colors.
with a mean of 1 and standard deviation of 3 – thus,  (1, 3) – 2. Calculate the mean and standard deviation of each LAB
was exploited in this study. channel in the reference image.
3. Calculate the mean and standard deviation of each LAB
Fig. 4 shows examples of original images, together with their aug- channel in the input image that undergoes color correction.
mented versions after applying all the augmentation techniques. 4. Apply color correction on the LAB channels of the input image.
For each channel (L, A, B), the pixel values of the input image
3.3.2. Synthetic image generation are modified to match the color statistics of the reference image
Synthetic images are computer-generated visuals intended to mimic by using the formula:
real-world objects (Man & Chahl, 2022; Schott, Brown, Raqueño, Gross, ( )
𝜎𝑗
& Robinson, 1999). In recent studies, GANs have been investigated 𝑦 = (𝑝 − 𝜇𝑖 ) ⋅ + 𝜇𝑗 , (2)
for their application in the agricultural domain, with a specific focus 𝜎𝑖
on data augmentation (Liu, Tan, Li, He, & Wang, 2020; Lu, Chen, where:
5
Fig. 5. The (a) original and (b) synthetically generated images using DALLE-2.
Fig. 6. The examples (a–c) of original, synthetically generated, and color-corrected images.
• 𝑖 and 𝑗 are the identifiers of the input and reference image, The examples of color-corrected images are displayed in Fig. 6.
• 𝑝 is the original pixel value in the input image,
• 𝜇𝑖 is the mean of the input channel, 3.4. Model-related components for FSIC
• 𝜇𝑗 and 𝜎𝑗 are mean and standard deviation of the reference
channel, In this study, EfficientNetLite (Tan & Le, 2019), a compact deep
• 𝑦 is the corrected pixel value. learning architecture, was used due to the constraints imposed by
the OPS-SAT Case challenge, as it was fixed by the organizers—no
Thus, the input image’s pixel values are adjusted proportion- other architecture was allowed. Apart from the discussed approaches,
ally to the difference in color statistics between the input and we benefit from the hyperparameter tuning and cross-validation tech-
reference images. niques (Stone, 1974) that play a vital role in optimizing model per-
5. Convert the corrected image back to the original color formance, as well as from the ensemble learning (Dasarathy & Sheela,
space. 1979), which combines multiple base models.
6
Fig. 7. EfficientNet-B0 building blocks.
3.4.1. The EfficientNetLite architecture distribution and the true target label. This loss is defined as (for 𝑁
EfficientNetLite has gained popularity for its remarkable perfor- examples in the training set):
mance in various tasks (Tan & Le, 2019). It was introduced as a
1 ∑[
𝑁
( ) ( ) ( )]
lightweight variant of EfficientNet, and it focuses on resource- 𝐽 (𝐰) = − 𝑦 log 𝑦̂𝑖 + 1 − 𝑦𝑖 log 1 − 𝑦̂𝑖 , (3)
𝑁 𝑖=1 𝑖
constrained environments, such as edge devices, where low-memory
and low-energy consumption are essential. This architecture leverages where:
compound scaling, which balances depth, width, and resolution to
optimize the model performance without significantly increasing its • 𝐰 refers to the model parameters, and
computational complexity. Its versatility and excellent trade-off be- • 𝑦𝑖 and 𝑦̂𝑖 are the true and predicted labels.
tween the performance and efficiency make it a compelling choice for Focal Loss (Lin, Goyal, Girshick et al., 2017) is a specialized loss
real-world applications with limited computational resources (Tan & function designed to address the issue of class imbalance in classi-
Le, 2021). Fig. 7 shows the main components of EfficientNet-B0 used fication tasks. In scenarios where certain classes have significantly
in this study. The architecture consists of 237 layers and can be fully fewer samples than others, a standard cross-entropy loss may lead
described from the building blocks—for details, see the supplementary the model to focus excessively on the majority classes, potentially
materials. neglecting the minority ones. Focal Loss modifies the cross-entropy loss
by down-weighting the contribution of easy-to-classify examples and
3.4.2. Loss function emphasizing the importance of hard-to-classify ones. This is achieved
To enhance the model performance, various loss functions (De- by introducing a modulating factor that increases the contribution of
hghan, Abbasi, Razzaghi, Banadkuki, & Gharaghani, 2024), such as misclassified examples during training:
sparse categorical cross-entropy and Focal Loss (Lin, Goyal, Girshick, ∑
𝑁
( )𝛾 ( )
He & Dollár, 2017), are commonly employed in a variety of fields. FocalLoss = − 1 − 𝑝𝑖 log 𝑝𝑖 , (4)
The former loss function is widely used for multi-class classification 𝑖=1
tasks—it measures the dissimilarity between the predicted probability where:
7
Table 1
Symbols used in the experimental study.
Symbol Meaning
𝛼 Focal Loss 𝛼 parameter
𝛾 Focal Loss 𝛾 parameter
BS Batch size
CE Categorical cross-entropy loss function
ENMA Ensemble model accuracy evaluated on the test set
ENMCK Ensemble model Cohen’s kappa score evaluated on the test set
FA Average validation accuracy of the 𝑘 models
FL Focal Loss
FSTD Standard deviation of the 𝑘 models accuracies
LF Loss function
MA Test set accuracy of the best model according to its validation accuracy
MCK Test set Cohen’s kappa score of the best model according to its validation accuracy
P Early stopping patience parameter
F Number of frozen layers during the transfer learning process
• 𝑝𝑖 ∈ [0, 1] is the model’s predicted probability of the 𝑖-th class, 4.1. Experimental setup
• 𝛾 is the modulating factor with tunable focusing parameter 𝛾 ≥ 0,
• 𝑁 is the total number of classes. The experiments ran on a machine equipped with an Intel Core
i7-8750H CPU running at 2.20 GHz with 12 processors, and with an
The parameter 𝛾 allows for a gradual adjustment of the rate at which NVIDIA GP107M GeForce GTX 1050 Ti Mobile GPU, and 32 GB of RAM
easy examples are down-weighted. When 𝛾 is 0, Focal Loss becomes (Ubuntu 22.04.3). The project was developed using Python (version
equivalent to the cross-entropy loss. As 𝛾 is increased, the influence
3.9.16), with the TensorFlow 2.7.0 backend, and the implementation
of the modulating factor becomes more pronounced, amplifying the
is available at https://github.com/ShendoxParadox/Few-shot-satellite-
impact of misclassified examples (Lin, Goyal, Girshick et al., 2017).
image-classification-OPS-SAT, in order to ensure full reproducibility
of our study. The implementation is split into several modules: Im-
4. Experimental validation
age_Augmentation, Color_Correction and OPS_SAT_Dev—
all modules, together with an architectural diagram of the experiments
The objectives of our experimental study are multi-fold. We inves-
tigate the impact of data augmentation and synthetic data generation are comprehensively discussed in the supplementary materials.
on the deep model capabilities that could be deployed on board OPS-
SAT. We also delve into the details of hyperparameter tuning, where we 4.2. Symbols used in experiments
optimize specific model’s hyperparameters to attain high performance
in FSIC. To address these objectives, we designed five experiments Table 1 gathers the symbols used in the experimental study.
centered around distinct variations of the dataset:
4.3. Experimental results
i.The original dataset.
ii.An augmented version of the original dataset. This section presents the outcomes of five main experiments based
iii.A dataset enriched with synthetically generated images. on the five dataset variations discussed in earlier sections. With a
iv. A color-corrected version of the synthetically generated dataset very high significance within the experiments are the hyperparameters
variation. explored in detail throughout the different experimental runs, including
v. A combination of (i), (ii) and (iv). the early stopping patience parameter (the number of epochs without
improvements in the loss value), the choice of the loss function, along-
Throughout these diverse configurations, a comprehensive process of
side the respective Focal Loss hyperparameter values (𝛼 and 𝛾), and the
hyperparameter tuning was undertaken, coupled with the exploration
batch size. The batch size was strategically reduced when working with
of two loss functions. To ensure the robustness of the results against
the dataset variation characterized by a larger sample size, allowing
different data distributions, the number of cross-validation folds varied
across experiments. Notably, the variant (iv) experienced a further for efficient memory utilization and accommodating the computational
layer of complexity through the inclusion of transfer learning and pre- demands of processing numerous samples per iteration. Conversely, in
training on two other datasets. It is noteworthy to highlight that the scenarios involving a dataset variation with a lower sample size, the
initial four experiments only involved pre-training using ImageNet but batch size was adjusted more flexibly to optimize training dynamics
did not employ the transfer learning methodology. Transfer learning and resource utilization.
was introduced in the final (fifth) experiment in this experimental
series. 4.3.1. Experiment 1: Original dataset
Across these five primary setups, an extensive total of 107 experi- The first experiment centers around the original training set, con-
mental runs were executed, consuming a cumulative runtime exceeding sisting of 40 images (five images per class). In this baseline, a total
38 h on a local machine. A unified test set, comprising 40 distinct of 14 distinct runs were undertaken, encompassing an exploration of
images distributed evenly at 5 images per class, was consistently em- the hyperparameter search space. The outcomes of this experiment lay
ployed. These test images were withheld from the model throughout the groundwork for subsequent analyses, providing insights into the
the entire span of the experiments, ensuring their novelty to the model’s model’s performance across various configurations.
exposure. This test set was used in the final evaluation step to assess Table 2 gathers the results obtained on the test set. In this in-
the model’s performance by using the evaluation metrics, including stance, the ensemble model’s Cohen’s kappa is 0.54, accompanied by
multi-class accuracy and Cohen’s Kappa scores (Carletta, 1996; Co- an accuracy of 0.60. In the supplementary materials, we include the
hen, 1960) (for more details on the metrics, see the supplementary parallel plot depicting all configurations and hyperparameters, together
materials). The metrics encompass the average validation accuracy of with the primary quality metrics. It illustrates that the highest Cohen’s
the 𝑘 models trained (all models were initially 𝑘-fold cross-validated), kappa of 0.63 and accuracy of 0.68 were attained by an individual
complemented by its standard deviation. The model with the highest model characterized by the batch size of 8, sparse categorical cross-
validation accuracy undergoes a rigorous evaluation on the test set. entropy loss function and 40 epochs in early stopping. This observation
8
Table 2 Table 5
Experiment 1: Results on the test set for all configurations (sorted by ENMCK). Experiment 3: Results on the test set (sorted by ENMCK).
P LF 𝛼 𝛾 BS FA FSTD MA MCK ENMA ENMCK P LF 𝛼 𝛾 BS FA FSTD MA MCK ENMA ENMCK
40 CE – – 4 0.50 0.11 0.40 0.31 0.60 0.54 20 FL 0.2 2 8 0.74 0.04 0.63 0.57 0.70 0.66
40 FL 0.2 3 4 0.60 0.15 0.55 0.49 0.58 0.51 25 CE – – 8 0.74 0.06 0.60 0.54 0.68 0.63
35 CE – – 8 0.58 0.10 0.48 0.40 0.55 0.49 30 FL 0.2 2 32 0.78 0.04 0.63 0.57 0.68 0.63
40 FL 0.2 4 4 0.50 0.08 0.45 0.37 0.55 0.49 30 FL 0.2 2 8 0.81 0.06 0.53 0.46 0.68 0.63
45 CE – – 8 0.55 0.06 0.48 0.40 0.53 0.46 30 CE – – 32 0.77 0.09 0.55 0.49 0.65 0.60
45 CE – – 4 0.65 0.17 0.43 0.34 0.53 0.46 30 FL 0.2 2 8 0.77 0.05 0.60 0.54 0.65 0.60
25 CE – – 8 0.43 0.13 0.48 0.40 0.50 0.43 30 CE – – 16 0.79 0.06 0.45 0.37 0.63 0.57
40 CE – – 8 0.63 0.25 0.68 0.63 0.50 0.43 20 FL 0.2 2 16 0.77 0.05 0.40 0.31 0.63 0.57
40 FL 0.5 2 4 0.50 0.18 0.50 0.43 0.48 0.40 20 FL 0.2 2 32 0.74 0.13 0.53 0.46 0.63 0.57
40 FL 0.2 2 4 0.50 0.14 0.58 0.51 0.48 0.40 25 CE – – 4 0.66 0.03 0.43 0.34 0.60 0.54
15 CE – – 8 0.40 0.15 0.35 0.26 0.43 0.34 20 FL 0.2 2 8 0.78 0.07 0.53 0.46 0.60 0.54
40 CE – – 16 0.40 0.18 0.50 0.43 0.38 0.29 30 FL 0.2 2 4 0.65 0.05 0.53 0.46 0.58 0.51
40 FL 0.7 2 4 0.58 0.06 0.35 0.26 0.38 0.29 40 CE – – 4 0.74 0.04 0.45 0.37 0.55 0.49
40 CE – – 32 0.23 0.17 0.13 0.00 0.15 0.03 25 CE – – 16 0.79 0.04 0.53 0.46 0.53 0.46
Table 3
Effect of augmentation techniques on the training set size. supplementary materials underscores the ensemble results, revealing a
Augmentation technique Size of the dataset narrow range between the poorest and the best-performing runs. This
None 40 emphasizes the consistency regardless of the diverse hyperparameter
+ Image Rotations 160 combinations explored. We can appreciate an intriguing observation
+ Flips (Horizontal and Vertical) 480
+ Random Contrast & Brightness 960
related to the early stopping patience parameter, which was frequently
+ Gaussian Noise 1920 set at a low value of 2 across most runs. This expedited stopping
criterion aligns with the dynamics of this augmented dataset variation.
The augmented data renders the evaluation fold closely resembling the
Table 4
training folds, causing the model to achieve high performance swiftly,
Experiment 2: Results on the test set (sorted by ENMCK).
necessitating an early stopping to prevent overfitting.
P LF 𝛼 𝛾 BS FA FSTD MA MCK ENMA ENMCK
2 FL 0.4 2 6 0.97 0.02 0.60 0.54 0.73 0.69
4.3.3. Experiment 3: Synthetic data generation
2 CE – – 8 0.97 0.02 0.68 0.63 0.68 0.63
2 FL 0.2 2 8 0.97 0.02 0.55 0.49 0.68 0.63 The third experiment revolves around a dataset variant that exclu-
2 FL 0.3 2 8 0.97 0.06 0.73 0.69 0.68 0.63 sively comprises synthetically generated images, alongside the original
2 FL 0.3 2 6 0.95 0.04 0.65 0.60 0.68 0.63 images. It encompasses 200 training images (25 images per class, with
1 FL 0.4 2 6 0.91 0.06 0.65 0.60 0.68 0.63 5 original and 20 synthesized images). This experiment employs a total
2 FL 0.5 2 6 0.95 0.04 0.58 0.51 0.65 0.60
of 14 runs. The exploration of this dataset variant aims at shedding light
2 FL 0.4 3 6 0.97 0.01 0.65 0.60 0.65 0.60
2 FL 0.4 2 6 0.97 0.03 0.70 0.66 0.65 0.60 on the interaction between real-world OPS-SAT imagery and artificially
2 CE – – 8 0.92 0.06 0.55 0.49 0.63 0.57 generated data in the context of training.
2 CE – – 8 0.97 0.01 0.65 0.60 0.63 0.57 Similar to the previous experiment, a remarkable degree of con-
2 FL 0.4 2 6 0.96 0.04 0.68 0.63 0.63 0.57
sistency in performance across all runs is evident—we can observe
2 FL 0.4 3 6 0.97 0.01 0.58 0.51 0.63 0.57
9 CE – – 4 1.00 0.00 0.48 0.40 0.58 0.51
the elevated metrics when compared to the original dataset. Despite
2 CE – – 4 0.94 0.03 0.55 0.49 0.58 0.51 the dataset size being smaller than the augmented variant, the out-
comes underscore the model’s proficiency in navigating the distinctive
attributes of synthetic imagery.
Table 5 encapsulates the results, revealing that the ensemble achieved
underscores the conclusion that, within the context of this experiment, a higher Cohen’s kappa score of 0.66, while the best-performing in-
the superior performance was achieved by a single model, rather than dividual model secured a score of 0.571. The parallel plot shows
the ensemble. The plot shows a negative relation between the batch a relatively tight range between the least and best-performing runs,
size and the Cohen’s kappa—the larger batch sizes tend to yield poorer reaffirming the consistent nature of the results (see the supplement).
model performance. The early stopping patience had to be extended due to the limited
number of training samples. Given the smaller dataset size, the model
4.3.2. Experiment 2: Data augmentation necessitated additional time to learn meaningful patterns from the
The second experiment investigates the impact of the augmentation synthetic imagery, prompting a more tolerant stopping criterion to
techniques to expand the original dataset (Table 3) on the resulting prevent underfitting and to ensure appropriate training dynamics.
model performance. Notably, this variation presents other challenges
(when compared to training from limited training samples), primarily 4.3.4. Experiment 4: Color-corrected synthetically generated images
stemming from its vast training dataset and the resultant memory lim- The fourth experiment is an extension of the previous one, sharing
itations, leading to the reduction of the batch size. In this experiment, the same training set composition. This variant, however, incorporates
a total of 15 configurations were analyzed. color-corrected synthetically generated images alongside the original
A notable improvement compared to the previous experiment in the data. Here, each class consists of 5 original images augmented by 20
evaluation metrics are observed. Additionally, the experiment show- color-corrected synthetic ones. A total of 13 runs were conducted. This
cases a remarkable degree of consistency in performance across all runs. iteration aims to assess how the model’s adaptability is influenced by
This phenomenon can be attributed to the robust data availability in- the addition of color-corrected synthetic images, shedding light on the
troduced by the employed augmentation strategies, contributing to the potential enhancement brought by aligning the color attributes of the
more stable and elevated performance levels. Table 4 encapsulates the generated data with the original OPS-SAT imagery.
outcomes of this experiment, where both the ensemble model and an Table 6 underscores an improvement in performance, with the
individual model yield a Cohen’s kappa score of 0.69, both employing ensemble model’s Cohen’s kappa score attaining 0.69 across multiple
the FL loss function. Furthermore, the parallel plot included in the hyperparameters. This trend shows the remarkable coherence of the
9
Table 6 Table 7
Experiment 4: Results on the test set (sorted by ENMCK). Experiment 5: Results on the test set (sorted by ENMCK).
P LF 𝛼 𝛾 BS FA FSTD MA MCK ENMA ENMCK P LF 𝛼 𝛾 BS FA FSTD MA MCK ENMA ENMCK
30 CE 0.2 – – 0.75 0.03 0.55 0.49 0.73 0.69 3 FL 0.2 2 4 0.90 0.05 0.55 0.49 0.78 0.74
20 FL 0.2 2 16 0.84 0.09 0.63 0.57 0.73 0.69 5 CE – – 8 0.98 0.01 0.65 0.60 0.75 0.71
30 FL 0.2 2 16 0.84 0.05 0.60 0.54 0.73 0.69 4 FL 0.1 1 8 0.97 0.01 0.68 0.63 0.73 0.69
30 CE 0.2 – – 0.85 0.03 0.43 0.34 0.70 0.66 3 FL 0.1 1 8 0.96 0.02 0.63 0.57 0.73 0.69
30 CE 0.2 – – 0.84 0.07 0.50 0.43 0.68 0.63 4 CE – – 8 0.97 0.02 0.55 0.49 0.70 0.66
30 FL 0.2 2 4 0.72 0.04 0.38 0.29 0.68 0.63 5 CE – – 8 0.99 0.01 0.58 0.51 0.70 0.66
35 CE 0.2 – – 0.71 0.06 0.45 0.37 0.65 0.60 2 FL 0.1 1 6 0.91 0.04 0.63 0.57 0.70 0.66
30 CE 0.2 – – 0.83 0.10 0.60 0.54 0.65 0.60 2 FL 0.1 1 8 0.95 0.02 0.53 0.46 0.70 0.66
30 FL 0.2 2 32 0.82 0.06 0.58 0.51 0.65 0.60 3 FL 0.2 1 8 0.96 0.02 0.68 0.63 0.70 0.66
20 CE 0.2 – – 0.67 0.08 0.48 0.40 0.63 0.57 3 FL 0.2 2 10 0.95 0.03 0.60 0.54 0.70 0.66
30 FL 0.2 2 8 0.84 0.06 0.65 0.60 0.63 0.57 3 FL 0.2 2 6 0.92 0.04 0.60 0.54 0.70 0.66
20 FL 0.2 2 8 0.80 0.05 0.65 0.60 0.60 0.54 4 CE – – 10 0.96 0.03 0.50 0.43 0.68 0.63
20 FL 0.2 2 4 0.73 0.03 0.38 0.29 0.58 0.51 3 FL 0.1 1 4 0.90 0.06 0.55 0.49 0.68 0.63
3 FL 0.2 2 8 0.93 0.04 0.70 0.66 0.68 0.63
3 FL 0.3 2 8 0.94 0.02 0.58 0.51 0.68 0.63
3 FL 0.4 4 4 0.90 0.07 0.70 0.66 0.68 0.63
results, and the parallel plot (in the supplementary materials) indicates 4 CE – – 4 0.94 0.03 0.55 0.49 0.65 0.60
3 CE – – 8 0.96 0.03 0.48 0.40 0.65 0.60
the consistent performance of ensembles, with Cohen’s kappa scores
5 CE – – 8 0.98 0.02 0.65 0.60 0.65 0.60
ranging from 0.51 to 0.69. This notably lower range in comparison 4 FL 0.1 1 6 0.93 0.00 0.63 0.57 0.65 0.60
to the previous experiments substantiates the added value of color 3 FL 0.3 3 8 0.95 0.04 0.63 0.57 0.65 0.60
correction in not only enhancing the performance but also in improving 3 FL 0.2 2 4 0.90 0.04 0.50 0.43 0.65 0.60
its consistency. 3 FL 0.2 2 4 0.93 0.02 0.65 0.60 0.65 0.60
3 CE – – 4 0.85 0.11 0.55 0.49 0.63 0.57
2 FL 0.1 1 2 0.61 0.13 0.45 0.37 0.63 0.57
4.3.5. Experiment 5: Data augmentation and color-corrected synthetically 3 FL 0.1 1 2 0.81 0.10 0.55 0.49 0.63 0.57
generated images 4 FL 0.1 1 2 0.85 0.06 0.48 0.40 0.63 0.57
The final experiment introduces a dataset that combines augmented 2 CE – – 4 0.83 0.06 0.65 0.60 0.60 0.54
2 FL 0.1 1 4 0.86 0.03 0.53 0.46 0.60 0.54
data and color-corrected synthesized data alongside the original im-
4 FL 0.1 1 4 0.96 0.01 0.58 0.51 0.60 0.54
ages. It is composed of a total of 1600 images, evenly distributed at 200 3 FL 0.2 2 2 0.63 0.10 0.50 0.43 0.60 0.54
images per class. However, due to GPU memory constraints, a reduced
number of augmentation techniques are employed in this setting. Both
original and color-corrected synthetically generated images exhibited
90-degree rotation, controlled brightness and contrast adjustments, and consistently yielded superior results. The phenomenon of transfer learn-
the application of Gaussian noise. The experiment is structured into ing yielding worse results than pre-training can be attributed to several
four distinct segments. The first one mirrors the initial experiments, en- factors. The source datasets utilized for transfer learning inherently
compassing pre-training on ImageNet. The next three segments employ differ in terms of their content, distribution, and complexity compared
transfer learning methodologies, capitalizing on the ImageNet, Open- to the target set. This disparity may lead to a lack of alignment between
surfaces, and Landuse datasets. They involve the evaluation of different the learned features and the specifics of the current problem domain,
configurations for freezing layers in the network architecture. This resulting in suboptimal adaptation during transfer learning. Further-
multifaceted experiment culminates in a comprehensive assessment of more, the chosen hyperparameters, such as the number of frozen layers
the model’s performance across the diverse combinations of training and learning rates, could have been worse for the specific nature of the
samples, revealing the dynamics that arise when combining original, target dataset. This phenomenon, where transfer learning yields low-
color-corrected generated images, and augmented imagery within the quality results, is referred to as the ‘‘negative transfer’’ (Yosinski et al.,
training process with pre-training and transfer learning. 2014). It occurs when the knowledge acquired from the source domain
Table 7 encapsulates the 31 runs conducted with diverse hyper- fails to adapt to the target domain due to differences in dataset charac-
parameters. Ensembles outperform individual models, with the best teristics, problem complexity, or suboptimal hyperparameter settings,
ensemble attaining a Cohen’s kappa score of 0.74 and the correspond- leading to diminished performance.
ing accuracy of 0.78. This ensemble model’s utilization of the FL
loss function, coupled with a batch size of 4 and an early stopping 5. The OPS-SAT case: Competition overview and comparisons
patience1 parameter of 3, emerges as a standout configuration over
the challenging test set. It emphasizes the power of interaction of In this section, we explore the outcomes of the OPS-SAT case
both ensemble learning jointly with the combination of augmented challenge, and contextualize our methods as if we were to participate
and synthetic data. The parallel plot reinforces the uniformity of the in that competition.
results, underscoring the consistent outcomes obtained for the diverse
hyperparameter landscape (see the supplementary materials). 5.1. Our approach
Table 8 and the parallel plots (supplementary materials) show
the outcomes of the transfer learning fine-tuning experiments. These Our strategy integrated both data-centric and model-centric meth-
experiments yielded the performance results that lagged behind those ods, with an emphasis on only leveraging the limited training dataset
achieved in the previous experiments that employed pre-training on provided. We consciously opted against using the raw (unlabeled)
Imagenet. The comparative analysis underscores that, despite rigorous images supplied by the organizers, which were available for use in any
hyperparameter exploration, the pre-training approach on Imagenet way beneficial for the competitors. A sample of these images is shown
in Fig. 8. We decided not to incorporate any other external satellite
datasets similar to the one provided. Our decision was to solely rely
1
The number of processed epochs by each base model varies in the on the few labeled images provided, serving as the foundational data
best ensemble—see training-validation loss and training-validation accuracy for our methodology. This decision was in line with our commitment to
profiles for five models in the supplement (each model is trained over 4 folds, avoid manual labeling, pseudo-labeling, or self-training techniques. We
and validated over 1 fold from the training set). believed that employing these methods would contradict the core aims
10
Fig. 8. Four samples of the 26 original, raw (unlabled) OPS-SAT images. A single raw image could typically produce 100 (200 × 200 pixels) patches that could be used in either
train or test sets.
Table 8 To build the model for inferring over the competition’s test set, we
Experiment 5 with transfer learning using the ImageNet, Opensurfaces and Landuse
only used the initial training set provided, which contained 80 labeled
datasets: Results on the test set (sorted by ENMCK for each dataset).
images, with 10 images for each class, as the foundation for our model
P F 𝛼 𝛾 BS FA FSTD MA MCK ENMA ENMCK
training. To increase the amount of labeled data, we opted for the
ImageNet
generation of synthetic data. By applying synthetic image generation
3 10 0.2 2 16 0.96 0.03 0.58 0.51 0.70 0.66 techniques, we managed to expand our labeled image count from
3 10 0.2 2 4 0.92 0.02 0.50 0.43 0.65 0.60
3 50 0.2 2 4 0.94 0.03 0.60 0.54 0.63 0.57
80 to 471. Despite our previous experiments that demonstrated the
3 10 0.2 2 8 0.94 0.03 0.63 0.57 0.60 0.54 effectiveness of the color correction of the synthetically generated data,
3 20 0.2 2 4 0.93 0.04 0.53 0.46 0.58 0.51 this technique did not yield beneficial results when applied to the
3 100 0.2 2 4 0.92 0.03 0.58 0.51 0.53 0.46 competition’s test set (Derksen et al., 2023). We then proceeded to
Opensurfaces implement selective image augmentation methods, including rotation,
3 50 0.2 2 4 0.89 0.02 0.53 0.46 0.58 0.51 vertical flip, random adjustments to brightness and contrast, as well
3 20 0.2 2 4 0.91 0.03 0.43 0.34 0.53 0.46 as the addition of Gaussian noise. This tailored augmentation process
3 10 0.2 2 4 0.94 0.02 0.43 0.34 0.50 0.43
enabled us to further increase the number of labeled images, reaching
3 0 0.2 2 4 0.89 0.05 0.43 0.34 0.50 0.43
3 100 0.2 2 4 0.84 0.02 0.43 0.34 0.48 0.40 the final dataset of 2,654 images.
3 5 0.2 2 4 0.95 0.03 0.40 0.31 0.43 0.34 EfficientNet-Lite0 was the selected model, as specified by the com-
Landuse petition’s rules. We exploited Focal Loss with 𝜶 = 0.5 and 𝜸 = 2, this
6 10 0.2 2 4 0.86 0.06 0.38 0.29 0.50 0.43 value of alpha does not prioritize one class over another. On the other
3 30 0.2 2 4 0.57 0.18 0.45 0.37 0.45 0.37 hand, this gamma value indicates a strong desire to focus on hard-to-
5 20 0.2 2 4 0.86 0.06 0.33 0.23 0.45 0.37 classify examples. Transfer learning was not employed, but the model
6 5 0.2 2 4 0.94 0.03 0.25 0.14 0.45 0.37 underwent pre-training with ImageNet weights, as initiating training
3 10 0.2 2 4 0.53 0.11 0.30 0.20 0.43 0.34
3 50 0.2 2 4 0.71 0.11 0.33 0.23 0.40 0.31
without any predefined weights resulted in poor results. The early
3 100 0.2 2 4 0.57 0.13 0.40 0.31 0.38 0.29 stopping strategy was implemented to avoid overfitting, with a patience
3 0 0.2 2 4 0.39 0.17 0.35 0.26 0.38 0.29 parameter set to 3. Training employed 2-fold cross-validation, where
half of the training data served as the validation set—the best model
emerged from one of these two training sessions. A dropout hyperpa-
rameter of 0.5 was applied to the model to further mitigate overfitting.
of the competition, which are primarily to reduce the labor-intensive Lastly, the model was trained using the Adam optimizer (Kingma & Ba,
nature of ‘‘manual’’ training. However, we acknowledge that integrat- 2017) with a batch size of 8. To verify the GPU memory allocation and
ing our methods with semi-supervised techniques could potentially the CPU number of threads used during the training process, see the
enhance our results significantly. supplementary materials. Of note, our results were obtained after the
11
Table 9
The results obtained by the top-performing teams, the baseline and our model. We report the
official challenge metric used by the organizers and obtained over the competition’s test set (),
this metric but obtained using the quantized version of the model (⊊ ), and accuracy (Acc.). The
Manual and Pseudo. column refer to the usage of either manual labeling or pseudo-labeling on
either the provided raw images or on an external dataset, whereas the Augment. column refers to
the usage of image augmentation techniques. The † rank refers to the hypothetical participation—
this would have been our rank if we had participated in the competition (as we did not, we do
not have the ⊊ value calculated by the evaluation server). For the perico and agsa teams, the
internals of their techniques are unknown (they were not presented in Meoni et al. (2024)), and for
the latter team its accuracy was not reported. The number of significant digits is kept as reported
in Meoni et al. (2024).
Rank Team  (↓) 𝑞𝑢𝑎𝑛𝑡 (↓) Acc. (↑) Manual Pseudo. Augment.
1 inovor 0.38109 0.38109 0.67857 Yes No Yes
2 ND2I 0.40868 0.40870 0.65646 Yes Yes Yes
3 sim-team 0.47429 0.47429 0.59694 Yes Yes Yes
...
10 perico 0.52893 0.52880 0.55272 N/A N/A N/A
...
11† Our model 0.53862 N/A 0.53911 No No Yes
12 Baseline — 0.53969 0.46220 No Yes No
...
48 agsa 1.00196 0.99998 N/A N/A N/A N/A
competition had ended and were based on the test dataset that was optimizer (Hinton) to train EfficientNet-Lite0, and exploited the Sig-
released approximately 13 months post-competition. moid Focal Cross Entropy loss function (Lin, Goyal, Girshick, He &
Dollar, 2017), indicating a focus on addressing class imbalance within
5.2. Other approaches their dataset.
The team from Ubotica, which took the third place, also relied
5.2.1. The competition baseline on manually- and pseudo-labeled raw images, but they did not incor-
The organizers used a training pipeline derived from a version porate any additional external datasets into their methodology. For
of FixMatch (Sohn et al., 2020), known as MSMatch (Gómez & Me- data augmentation, they chose a more streamlined set of techniques,
oni, 2021), specifically designed for remote sensing tasks. This semi- limiting themselves to flips and rotations. Their final dataset consisted
supervised training method employs pseudo-labeling and consistency of 3,000–7,000 images. The authors employed a combination of Sparse
regularization techniques to add an extra loss component to the cate- Categorical Cross Entropy and Cosine Similarity (Salton, Wong, &
gorical cross-entropy (Meoni et al., 2024). The organizers then used ten Yang, 1975) as the loss functions, and trained the models using the
examples from each class that form the training dataset to serve as the Adam optimizer. Regarding the model architecture, they used a hybrid
labeled set. They cropped the full images from the raw image dataset approach, combining EfficientNet-Lite0 with the Xception model (Chol-
(given to the competitors) into patches of 200 × 200 pixels, which were let, 2017). It is important to emphasize, however, that such hybrid
then used as unlabeled data. After training the model, they achieved a approaches were not allowed—the organizers explicitly stated in the
score 𝑞𝑢𝑎𝑛𝑡 = 0.54 and an accuracy rate of 0.46. submission rules: ‘‘The submission for this challenge consists of a single
.h5 file containing the parameters of a EfficientNetLite0 neural network’’.
5.2.2. The top-3 approaches
The first-ranked team (Inovor Technologies) crafted 200 × 200 pixel 5.3. Results and comparisons
patches from the raw images, enhancing their data pool using manual
labeling techniques (Meoni et al., 2024). This was in addition to using According to the latest work by the organizers of the competi-
the labeled images already present in the training set. Their approach tion (Meoni et al., 2024), there were 56 teams that signed up for
extended to incorporating an additional dataset from the OPS-SAT the challenge, 41 of them successfully making it onto the leaderboard
Flickr album (European Space Agency, 2020), which they manually by submitting at least one valid entry. The final outcomes for the 48
labeled after conducting color adjustments. Their data augmentation participating teams yielded a range of the 𝑞𝑢𝑎𝑛𝑡 scores that spanned
strategy included a variety of techniques, such as cropping, zooming, from 1.002 (marking the least successful result) to 0.381, achieved by
and rotating. Also, this team adjusted the brightness and contrast levels the winning team. Here,  = 1 − 𝜅 and 𝜅 is Cohen’s Kappa metric and
and applied hue and saturation jittering. Perspective shifts and gamma 𝑞𝑢𝑎𝑛𝑡 is the score corresponding to the quantized version of the model.
corrections were also implemented to further enrich the dataset. The Table 9 displays the performance of our top-performing model
final dataset’s size remains confidential. In terms of their model-related alongside the results of the selected competitors and the baseline solu-
selections, they opted for the Adam optimizer and used Focal Loss for tion proposed within the OPS-SAT Case challenge. A direct comparison
the EfficientNet-Lite0 model. with the top three performers of the competition is impractical because
The team that secured the second place, Capgemini SE, also adopted their strategies were fundamentally based on manual labeling. How-
a hands-on approach with the original dataset of raw images by manu- ever, a comparison with the competition’s baseline is feasible since it
ally annotating the cropped patches. In addition to labeling the raw im- employed pseudo-labeling on the raw images from the original dataset,
ages, they expanded their dataset with manual annotations of NWPU- as opposed to manual labeling. Unlike the baseline approach, we
RESICS45 (Cheng, Han, & Lu, 2017), an open dataset intended to adopted neither pseudo- nor manual labeling, but instead, we leveraged
facilitate research in remote sensing. This team further augmented their the few existing labeled images to generate additional synthetic data.
labeled data through the use of pseudo-labeling techniques. Regard- Our performance aligns closely with the organizers baseline model,
ing their data augmentation strategies, they implemented vertical and and notably, our model demonstrated a significantly higher accuracy.
horizontal flipping, random adjustments in brightness and contrast, Of note, our proposed pipeline offers real-time operation—our average
solarization, RGB color shifting, coarse dropout, shift-scale rotation, inference time for an individual image prediction is 37.4 ms.
and resizing. These methods contributed to the creation of a final Fig. 9 presents example patches correctly classified by our model,
dataset that comprised 12,583 images. The authors used the RMSprop demonstrating its efficacy in recognizing diverse areas. In Fig. 10,
12
Fig. 9. Examples of correct predictions obtained by our model.
Fig. 10. Examples of misclassified patches elaborated by our model.
Table 10
The classification performance quantified as per-class accuracy.
Class Accuracy
River 0.19
Mountain 0.21
Agricultural 0.36
Natural 0.43
Snow 0.43
Cloud 0.75
Sea ice 0.84
Water 0.84
on the other hand, we gather misclassified patches, highlighting the of 0.19 and 0.21 respectively. The agricultural, natural, and snow
challenges faced in accurately classifying such images. These examples categories show moderate accuracy, reflecting potential challenges in
underline the complexities in FSIC, especially in scenarios with subtle discerning subtler differences in land cover. These insights can di-
differences or overlapping characteristics. rect efforts to improve the model, especially in the underperforming
Table 10 presents the per-class accuracy obtained over the com- categories.
petition’s test set using our model. It performs exceptionally well in In our research and analysis, we focused exclusively on
identifying water and sea ice with accuracy reaching 0.84, suggesting EfficientNetLite-B0, as it was the only model approved for use in the
a robust capability to recognize features with distinct textures and competition. Nonetheless, we were additionally conducted a compara-
colors. Here, the cloud identification is also relatively high. Conversely, tive study to explore other potential deep learning architectures that
the model struggles with rivers and mountains, indicated by accuracy could also be classified as lightweight, even though they were not
13
Table 11 the spotlight was directed toward the interaction of data variations,
Comparison of different deep learning architectures. augmentation techniques, synthetic data generation, pre-training and
Architecture Pre-training  transfer learning methods. Notably, the choice of employing a mo-
EfficientNetLite-B0 ImageNet 0.5386 bile network architecture was strategic, considering the constraints
MobileNetV3 ImageNet 0.5693 of limited hardware resources not only on-board OPS-SAT, but also
Xception ImageNet 0.5931
MobileNetV2 ImageNet 0.8374
commonly associated with on-board computers on small-scale satellite
EfficientNetLite-B0 No pre-training 0.864 systems. Our experiments thus sought to bridge the gap between scarce
data availability and hardware limitations, carving a path to more ef-
fective utilization of remote sensing data for a countless of applications
in the realm of EO and space exploration. The extensive experimental
eligible for the competition. Table 11 presents the results of some study collectively illuminated the detailed interaction between data
experiments conducted on different deep learning architectures and variations, augmentation techniques, and transfer learning approaches
tested on the competition test set, and scored using the scoring met- deployed for real-world multi-class image classification of OPS-SAT
ric (, ↓), with zero indicating the perfect score. The pre-trained images. The augmentation strategies coupled with synthetic data gener-
EfficientNetLite-B0, with a score of 0.54, is indeed the best-performing ation consistently enhanced model performance, with the incorporation
model. On the other end, EfficientNetLite-B0 with no pre-training of color correction on the synthetically generated images further ele-
obtains the score of 0.86, making it the least desirable model based vating performance. The ensembles almost always consistently outper-
on this metric. The remaining architectures, while pre-trained on Ima- formed individual models, highlighting the value of ensemble learning.
geNet, exhibit moderate performance with scores ranging from 0.57 for The cautionary lesson drawn from instances of negative transfer un-
MobileNetV3 to 0.59 for Xception, and 0.84 for MobileNetV2. However, derscores the complex nature of domain adaptation, emphasizing the
MobileNetV3 delivered the second best outcomes, trailing only the top- careful orchestration of pre-training and fine-tuning strategies. This
performing model, suggesting that with a more dedicated effort toward study not only offers valuable insights into effective model training but
hyperparameter optimization, it might yield improved results. also serves as a reminder of the multifaceted challenges that accompany
deep learning experimentation.
5.4. The final remarks There is a promising trajectory for future research and development.
The application of other FSL techniques, such as meta-learning, active
To avoid the data leakage issues between the training and test learning and most importantly pseudo-labeling or self-training using
datasets – which could lead to an underestimation of the true classifica- semi-supervised techniques, could significantly enhance the model’s
tion error – the organizers of the OPS-SAT Case challenge implemented ability to generalize from limited labeled data. Moreover, the investiga-
a training-test split at the image level (Meoni et al., 2024). This means tion of transfer learning on more extensive and diverse satellite image
that patches derived from the same image were exclusively assigned datasets offers substantial potential for improved model performance
to either training or the test set. This implies that a single raw image and adaptability to various satellite missions. These avenues of future
(2048 × 1944 pixels), captured by the satellite could be divided into work can advance the capabilities of on-board AI systems, making them
numerous smaller (200 × 200 pixels) images. The image-level dataset even more skilled at processing and extracting valuable insights from
split led to significant variations in the distribution of patches within satellite imagery, ultimately contributing to enhanced satellite mission
the same class across the training and test sets. This discrepancy became success.
apparent while analyzing the published test set and comparing it with
the provided training set. Essentially, we observed that the limited set CRediT authorship contribution statement
of labeled training images that were made available did not accurately
represent the entire image data distribution. Therefore, we conclude Ramez Shendy: Conceptualization, Investigation, Data curation,
that it would be nearly impossible for the provided training dataset Methodology, Software, Validation, Visualization, Formal analysis, Writ-
to serve as a representative sample of the entire dataset by itself. ing – original draft, Writing – review & editing. Jakub Nalepa: Concep-
Consequently, making use of the raw images, which the organizers en- tualization, Formal analysis, Investigation, Methodology, Supervision,
couraged, became essential to achieve high results. Despite not utilizing Validation, Writing – original draft, Writing – review & editing, Project
the original raw image dataset, we still managed to achieve the results administration, Funding acquisition.
surpassing those of the competition’s baseline. This suggests that our
approach has a strong generalization capability, performing well on the Declaration of competing interest
challenging test set.
The results of the competition underscore the importance of ad- The authors declare that they have no known competing finan-
vancing semi-supervised learning and fully automated labeling meth- cial interests or personal relationships that could have appeared to
ods (Meoni et al., 2024). Our research advocates for innovative tech- influence the work reported in this paper.
niques like synthetic image generation as an alternative to manual
annotation. This strategy, though still in its development phase, could Data availability
be transformative for image classification in general, and satellite image
classification in particular. In this work, we have demonstrated the po- The data used in this study is publicly available, and the reference
tential of this approach by achieving high-quality classification results to this data is included in the manuscript.
in a challenging few-shot satellite image classification scenario.
Acknowledgments
6. Conclusion and future work
JN was supported by the Silesian University of Technology, Poland
The availability of training data for EO applications is exceedingly grant for maintaining and developing research potential, and by the
limited; hence, developing the approaches which can learn from limited Silesian University of Technology grant (02/080/RGJ24/0043).
training samples is of paramount practical importance. In this article,
we addressed this challenge from a data-centric perspective rather Appendix A. Supplementary data
than solely relying on model architecture. This approach was facili-
tated by employing an immutable deep learning model as the founda- Supplementary material related to this article can be found online
tional framework. By maintaining the model’s architecture unchanged, at https://doi.org/10.1016/j.eswa.2024.123984.
14
References Fe-Fei, L., Fergus, & Perona (2003). A Bayesian approach to unsupervised one-
shot learning of object categories. Vol. 2, In Proceedings ninth IEEE international
Agency, E. S. (2023). AI on the edge: ‘‘the OPS-SAT case’’. Available at: https://kelvins. conference on computer vision (pp. 1134–1141). http://dx.doi.org/10.1109/ICCV.
esa.int/opssat/home/, Accessed on: 2023-07-15. 2003.1238476.
Aleem, S., Kumar, T., Little, S., Bendechache, M., Brennan, R., & McGuinness, K. Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories.
(2022). Random data augmentation based enhancement: A generalized enhance- IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.
ment approach for medical datasets. http://dx.doi.org/10.48550/arXiv.2210.00824, http://dx.doi.org/10.1109/TPAMI.2006.79.
arXiv:2210.00824. Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast
Arslan, M., Guzel, M., Demirci, M., & Ozdemir, S. (2019). SMOTE and Gaussian noise adaptation of deep networks. In D. Precup, & Y. W. Teh (Eds.), Proceedings of
based sensor data augmentation. In 2019 4th international conference on computer machine learning research: vol. 70, Proceedings of the 34th international conference on
science and engineering (pp. 1–5). http://dx.doi.org/10.1109/UBMK.2019.8907003. machine learning (pp. 1126–1135). PMLR, URL https://proceedings.mlr.press/v70/
Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2019). Multimodal machine learning: A finn17a.html.
survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Furano, G., Meoni, G., Dunne, A., Moloney, D., Ferlet-Cavrois, V., Tavoularis, A., et al.
41(2), 423–443. http://dx.doi.org/10.1109/TPAMI.2018.2798607. (2020). Towards the use of artificial intelligence on the edge in space systems:
Bansal, M. A., Sharma, D. R., & Kathuria, D. M. (2022). A systematic review on Challenges and opportunities. IEEE Aerospace and Electronic Systems Magazine,
data scarcity problem in deep learning: solution and applications. ACM Computing 35(12), 44–56. http://dx.doi.org/10.1109/MAES.2020.3008468.
Surveys, 54(10s), 1–29. http://dx.doi.org/10.1145/3502287. Giuffrida, G., Diana, L., de Gioia, F., Benelli, G., Meoni, G., Donati, M., et al. (2020).
Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2014). Material recognition in the wild CloudScout: A deep neural network for on-board cloud detection on hyperspectral
with the materials in context database. In 2015 IEEE Conference on Computer Vision images. Remote Sensing, 12(14), http://dx.doi.org/10.3390/rs12142205, URL https:
and Pattern Recognition (CVPR) (pp. 3479–3487). URL https://api.semanticscholar. //www.mdpi.com/2072-4292/12/14/2205.
org/CorpusID:206593002. Gómez, P., & Meoni, G. (2021). Msmatch: Semisupervised multispectral scene classifi-
Biermann, L., Clewley, D., Martinez-Vicente, V., & Topouzelis, K. (2020). Finding plastic cation with few labels. IEEE Journal of Selected Topics in Applied Earth Observations
patches in coastal waters using optical satellite data. Scientific Reports, 10(1), 5364. and Remote Sensing, 14, 11643–11654. http://dx.doi.org/10.1109/JSTARS.2021.
http://dx.doi.org/10.1038/s41598-020-62298-z. 3126082.
Bischke, B., Helber, P., Borth, D., & Dengel, A. (2018). Segmentation of imbalanced Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et
classes in satellite imagery using adaptive uncertainty weighted class loss. In al. (2014). Generative adversarial nets. Advances in Neural Information Processing
IGARSS 2018 - 2018 IEEE international geoscience and remote sensing symposium (pp. Systems, 27.
6191–6194). http://dx.doi.org/10.1109/IGARSS.2018.8517836. Grigorescu, S., Trasnea, B., Cocias, T., & Macesanu, G. (2019). A survey of deep
Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statistic. http: learning techniques for autonomous driving. Journal of Field Robotics, 37, http:
//dx.doi.org/10.48550/arXiv.cmp-lg/9602004, arXiv preprint cmp-lg/9602004. //dx.doi.org/10.1002/rob.21918.
Cheng, G., Han, J., & Lu, X. (2017). Remote sensing image scene classification: He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
Benchmark and state of the art. Proceedings of the IEEE, 105(10), 1865–1883. recognition. In 2016 IEEE conference on computer vision and pattern recognition (pp.
http://dx.doi.org/10.1109/JPROC.2017.2675998. 770–778). http://dx.doi.org/10.1109/CVPR.2016.90.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Hinton, G. Lecture 6b a bag of tricks for mini-batch gradient descent. In Lecture slides
Proceedings of the IEEE conference on computer vision and pattern recognition. for CSC321: Neural networks and machine learning, URL https://www.cs.toronto.edu/
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational ~tijmen/csc321/slides/lecture_slides_lec6.pdf.
and Psychological Measurement, 20(1), 37–46. http://dx.doi.org/10.1177/ Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models.
001316446002000104. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), vol.
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2018). Autoaugment: 33, Advances in neural information processing systems (pp. 6840–6851). Curran
Learning augmentation policies from data. http://dx.doi.org/10.48550/arXiv.1805. Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2020/file/
09501, arXiv preprint arXiv:1805.09501. 4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf.
Dasarathy, B., & Sheela, B. (1979). A composite classifier system design: Concepts and
Ho, J., Saharia, C., Chan, W., Fleet, D. J., Norouzi, M., & Salimans, T. (2022). Cascaded
methodology. Proceedings of the IEEE, 67(5), 708–713. http://dx.doi.org/10.1109/
diffusion models for high fidelity image generation. Journal of Machine Learning
PROC.1979.11321.
Research, 23(1).
Dehghan, A., Abbasi, K., Razzaghi, P., Banadkuki, H., & Gharaghani, S. (2024). CCL-
Ho, J., & Salimans, T. (2021). Classifier-free diffusion guidance. In NeurIPS 2021
DTI: contributing the contrastive loss in drug–target interaction prediction. BMC
workshop on deep generative models and downstream applications. http://dx.doi.org/
Bioinformatics, 25(1), 48. http://dx.doi.org/10.1186/s12859-024-05671-3.
10.48550/arXiv.2207.12598, URL https://openreview.net/forum?id=qw8AKxfYbI.
Del Rosso, M. P., Sebastianelli, A., Spiller, D., Mathieu, P. P., & Ullo, S. L. (2021).
Irvin, J., Sheng, H., Ramachandran, N., Johnson-Yu, S., Zhou, S., Story, K., et al. (2020).
On-board volcanic eruption detection through CNNs and satellite multispectral im-
Forestnet: Classifying drivers of deforestation in indonesia using deep learning
agery. Remote Sensing, 13(17), http://dx.doi.org/10.3390/rs13173479, URL https:
on satellite imagery. http://dx.doi.org/10.48550/arXiv.2011.05479, arXiv preprint
//www.mdpi.com/2072-4292/13/17/3479.
arXiv:2011.05479.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-
Kadam, S., & Vaidya, V. (2020). Review and analysis of zero, one and few shot
scale hierarchical image database. In 2009 IEEE conference on computer vision and
learning approaches. In A. Abraham, A. K. Cherukuri, P. Melin, & N. Gandhi
pattern recognition (pp. 248–255). http://dx.doi.org/10.1109/CVPR.2009.5206848.
(Eds.), Intelligent systems design and applications (pp. 100–112). Cham: Springer
Derksen, D., Meoni, G., Lecuyer, G., Mergy, A., Märtens, M., & Izzo, D. (2021). Few-shot
International Publishing.
image classification challenge on-board. In Workshop-data centric AI, neurIPS. URL
https://api.semanticscholar.org/CorpusID:259325617. Kapoor, S., & Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-
Derksen, D., Meoni, G., Lecuyer, G., Mergy, A., Märtens, M., & Izzo, D. (2022). The learning-based science. Patterns, http://dx.doi.org/10.1016/j.patter.2023.100804,
OPS-SAT case dataset. http://dx.doi.org/10.5281/zenodo.6524750. URL https://www.cell.com/patterns/abstract/S2666-3899(23)00159-9.
Derksen, D., Meoni, G., Lecuyer, G., Mergy, A., Märtens, M., & Izzo, D. (2023). The Kingma, D. P., & Ba, J. (2017). Adam: A method for stochastic optimization. arXiv:
OPS-SAT case: test dataset. http://dx.doi.org/10.5281/zenodo.10301862. 1412.6980.
Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthesis. Kornblith, S., Shlens, J., & Le, Q. V. (2019). Do better imagenet models transfer
In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J. W. Vaughan (Eds.), better? In Proceedings of the IEEE/CVF conference on computer vision and pattern
vol. 34, Advances in neural information processing systems (pp. 8780–8794). Curran recognition (pp. 2661–2671).
Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep
49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf. convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, & K. Wein-
Diana, L., Xu, J., & Fanucci, L. (2021). Oil spill identification from SAR images for low berger (Eds.), Vol. 25, Advances in neural information processing systems. Curran
power embedded systems using CNN. Remote Sensing, 13(18), http://dx.doi.org/10. Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2012/file/
3390/rs13183606, URL https://www.mdpi.com/2072-4292/13/18/3606. c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
Dodge, S., & Karam, L. (2017). A study and comparison of human and deep learning Krueger, D., Caballero, E., Jacobsen, J.-H., Zhang, A., Binas, J., Zhang, D., et al.
recognition performance under visual distortions. In 2017 26th international confer- (2021). Out-of-distribution generalization via risk extrapolation (REx). In M. Meila,
ence on computer communication and networks (pp. 1–7). http://dx.doi.org/10.1109/ T. Zhang (Eds.), Proceedings of machine learning research: Vol. 139, Proceedings of
ICCCN.2017.8038465. the 38th international conference on machine learning (pp. 5815–5826). PMLR, URL
Domingos, P. (2012). A few useful things to know about machine learning. https://proceedings.mlr.press/v139/krueger21a.html.
Communications of the ACM, 55(10), 78–87. Kurekin, A. A., Loveday, B. R., Clements, O., Quartly, G. D., Miller, P. I., Wiafe, G., et
European Space Agency (2020). ESA events photo album. https://www.flickr.com/ al. (2019). Operational monitoring of illegal fishing in ghana through exploitation
photos/esa_events/albums/72157716491073681/. of satellite earth observation and AIS data. Remote Sensing, 11(3), http://dx.doi.
Evans, D., & Merri, M. (2014). OPS-sat: A ESA nanosatellite for accelerating innovation org/10.3390/rs11030293, URL https://www.mdpi.com/2072-4292/11/3/293.
in satellite control. ISBN: 978-1-62410-221-9, http://dx.doi.org/10.2514/6.2014- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
1702. http://dx.doi.org/10.1038/nature14539.
15
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied Qiao, S., Liu, C., Shen, W., & Yuille, A. L. (2018). Few-shot image recognition by
to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. http://dx.doi. predicting parameters from activations. In Proceedings of the IEEE conference on
org/10.1109/5.726791. computer vision and pattern recognition (pp. 7229–7238).
Lee, J.-G., Jun, S., Cho, Y.-W., Lee, H., Kim, G. B., Seo, J. B., et al. (2017). Deep learning Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021).
in medical imaging: general overview. Korean Journal of Radiology, 18(4), 570–584. Learning transferable visual models from natural language supervision. In M. Meila,
http://dx.doi.org/10.3348/kjr.2017.18.4.570, URL https://api.semanticscholar.org/ T. Zhang (Eds.), Proceedings of machine learning research: Vol. 139, Proceedings of
CorpusID:4345827. the 38th international conference on machine learning (pp. 8748–8763). PMLR, URL
Li, G., & Yu, Y. (2015). Visual saliency based on multiscale deep features. In 2015 IEEE
https://proceedings.mlr.press/v139/radford21a.html.
conference on computer vision and pattern recognition (pp. 5455–5463). Los Alamitos,
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-
CA, USA: IEEE Computer Society, http://dx.doi.org/10.1109/CVPR.2015.7299184,
conditional image generation with clip latents. http://dx.doi.org/10.48550/arXiv.
URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2015.7299184.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object 2204.06125, arXiv preprint arXiv:2204.06125.
detection. In Proceedings of the IEEE international conference on computer vision (pp. Rasmussen, C. E., Williams, C. K., et al. (2006). vol. 1, Gaussian processes for machine
2980–2988). learning. Springer.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal loss for dense object Razzaghi, P., Abbasi, K., & Ghasemi, J. B. (2023). Chapter 3 - Multivariate pattern
detection. In Proceedings of the IEEE international conference on computer vision. recognition by machine learning methods. In J. B. Ghasemi (Ed.), Machine learning
Liu, B., Tan, C., Li, S., He, J., & Wang, H. (2020). A data augmentation method based and pattern recognition methods in chemistry from multivariate and data driven model-
on generative adversarial networks for grape leaf disease identification. IEEE Access, ing (pp. 47–72). Elsevier, http://dx.doi.org/10.1016/B978-0-323-90408-7.00002-2,
8, 102188–102198. http://dx.doi.org/10.1109/ACCESS.2020.2998839. URL https://www.sciencedirect.com/science/article/pii/B9780323904087000022.
Liu, B., Yu, X., Yu, A., Zhang, P., Wan, G., & Wang, R. (2019). Deep few-shot learning Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. http://dx.doi.
for hyperspectral image classification. IEEE Transactions on Geoscience and Remote org/10.48550/arXiv.1804.02767, arXiv preprint arXiv:1804.02767.
Sensing, 57(4), 2290–2304. http://dx.doi.org/10.1109/TGRS.2018.2872830.
Reinhard, E., Adhikhmin, M., Gooch, B., & Shirley, P. (2001). Color transfer between
Liu, Y., Zhang, H., Zhang, W., Lu, G., Tian, Q., & Ling, N. (2022). Few-shot image
images. IEEE Computer Graphics and Applications, 21(5), 34–41. http://dx.doi.org/
classification: Current status and research trends. Electronics, 11(11), http://dx.
10.1109/38.946629.
doi.org/10.3390/electronics11111752, URL https://www.mdpi.com/2079-9292/
11/11/1752. Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing.
Lu, Y., Chen, D., Olaniyi, E., & Huang, Y. (2022). Generative adversarial net- Communications of the ACM, 18(11), 613–620. http://dx.doi.org/10.1145/361219.
works (GANs) for image augmentation in agriculture: A systematic review. 361220.
Computers and Electronics in Agriculture, 200, Article 107208. http://dx.doi.org/10. Sapkota, R. (2023). Harnessing the power of AI based image generation model DALLE 2
1016/j.compag.2022.107208, URL https://www.sciencedirect.com/science/article/ in agricultural settings. http://dx.doi.org/10.48550/arXiv.2307.08789, arXiv:2307.
pii/S0168169922005233. 08789.
Lu, J., Gong, P., Ye, J., Zhang, J., & Zhang, C. (2023). A survey on machine Schott, J., Brown, S., Raqueño, R., Gross, H., & Robinson, G. (1999). An advanced
learning from few samples. Pattern Recognition, 139, Article 109480. http://dx.doi. synthetic image generation model and its application to multi/hyperspectral al-
org/10.1016/j.patcog.2023.109480, URL https://www.sciencedirect.com/science/ gorithm development. Canadian Journal of Remote Sensing, 25(2), 99–111. http:
article/pii/S0031320323001802. //dx.doi.org/10.1080/07038992.1999.10874709.
Man, K., & Chahl, J. (2022). A review of synthetic image data and its use in computer
Shen, S., Li, L. H., Tan, H., Bansal, M., Rohrbach, A., Chang, K.-W., et al. (2021). How
vision. Journal of Imaging, 8(11), http://dx.doi.org/10.3390/jimaging8110310, URL
much can clip benefit vision-and-language tasks? http://dx.doi.org/10.48550/arXiv.
https://www.mdpi.com/2313-433X/8/11/310.
2107.06383, arXiv preprint arXiv:2107.06383.
Mateo-Garcia, G., Oprea, S., Smith, L., Veitch-Michaelis, J., Baydin, A. G., & Backes, D.
(2019). Flood detection on low cost orbital hardware. In Artificial intelligence for Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for
humanitarian assistance and disaster response workshop, 33rd conference on neural deep learning. Journal of Big Data, 6(1), 1–48. http://dx.doi.org/10.1186/s40537-
information processing systems (neurIPS 2019), Vancouver, Canada. 019-0197-0.
Mateo-Garcia, G., Veitch-Michaelis, J., Smith, L., Oprea, S. V., Schumann, G., Gal, Y., et Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale
al. (2021). Towards global flood mapping onboard low cost satellites with machine image recognition. In 3rd International Conference on Learning Representations (ICLR
learning. Scientific Reports, 11(1), 7249. http://dx.doi.org/10.1038/s41598-021- 2015) (pp. 1–14). Computational and Biological Learning Society.
86650-z. Slough, T., Kopas, J., & Urpelainen, J. (2021). Satellite-based deforestation alerts
Mellor, A., Boukir, S., Haywood, A., & Jones, S. (2015). Exploring issues of training with training and incentives for patrolling facilitate community monitoring in
data imbalance and mislabelling on random forest performance for large area land the Peruvian Amazon. Proceedings of the National Academy of Sciences, 118(29),
cover classification using the ensemble margin. ISPRS Journal of Photogrammetry and Article e2015171118. http://dx.doi.org/10.1073/pnas.2015171118, arXiv:https:
Remote Sensing, 105, 155–168. http://dx.doi.org/10.1016/j.isprsjprs.2015.03.014, //www.pnas.org/doi/pdf/10.1073/pnas.2015171118, URL https://www.pnas.org/
URL https://www.sciencedirect.com/science/article/pii/S0924271615000945. doi/abs/10.1073/pnas.2015171118.
Meoni, G., Märtens, M., Derksen, D., See, K., Lightheart, T., Sécher, A., et al.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep
(2024). The OPS-SAT case: A data-centric competition for onboard satellite image
unsupervised learning using nonequilibrium thermodynamics. In F. Bach, & D. Blei
classification. Astrodynamics, http://dx.doi.org/10.1007/s42064-023-0196-y.
(Eds.), Proceedings of machine learning research: vol. 37, Proceedings of the 32nd
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning
[online]. Adaptive computation and machine learning. international conference on machine learning (pp. 2256–2265). Lille, France: PMLR,
Murali, S., & Govindan, V. (2013). Shadow detection and removal from a single image URL https://proceedings.mlr.press/v37/sohl-dickstein15.html.
using LAB color space. Cybernetics and Information Technologies, 13(1), 95–103. Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C. A., et al.
http://dx.doi.org/10.2478/cait-2013-0009. (2020). FixMatch: Simplifying semi-supervised learning with consistency and con-
Nalepa, J., Myller, M., Cwiek, M., Zak, L., Lakota, T., Tulczyjew, L., et al. (2021). fidence. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.),
Towards on-board hyperspectral satellite image segmentation: Understanding ro- vol. 33, Advances in neural information processing systems (pp. 596–608). Curran
bustness of deep learning through simulating acquisition conditions. Remote Sensing, Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2020/file/
13(8), http://dx.doi.org/10.3390/rs13081532, URL https://www.mdpi.com/2072- 06964dce9addb1c5cb5d6e3d9838f733-Paper.pdf.
4292/13/8/1532. Solari, L., Del Soldato, M., Raspini, F., Barra, A., Bianchini, S., Confuorto, P., et
Nalepa, J., Myller, M., & Kawulok, M. (2020). Training- and test-time data augmen- al. (2020). Review of satellite interferometry for landslide detection in Italy.
tation for hyperspectral image segmentation. IEEE Geoscience and Remote Sensing Remote Sensing, 12(8), http://dx.doi.org/10.3390/rs12081351, URL https://www.
Letters, 17(2), 292–296. http://dx.doi.org/10.1109/LGRS.2019.2921011. mdpi.com/2072-4292/12/8/1351.
Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-
Song, Y., & Ermon, S. (2020). Improved techniques for training score-based generative
level image representations using convolutional neural networks. In Proceedings of
models. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Vol.
the IEEE conference on computer vision and pattern recognition (pp. 1717–1724).
33, Advances in neural information processing systems (pp. 12438–12448). Curran
Palhamkhani, F., Alipour, M., Dehnad, A., Abbasi, K., Razzaghi, P., & Ghasemi, J.
B. (2023). DeepCompoundNet: enhancing compound–protein interaction prediction Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2020/file/
with multimodal convolutional neural networks. Journal of Biomolecular Structure 92c3b916311a5517d9290576e3ea37ad-Paper.pdf.
and Dynamics, 1–10. http://dx.doi.org/10.1080/07391102.2023.2291829, PMID: Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions.
38084744. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 36(2),
Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image 111–133. http://dx.doi.org/10.1111/j.2517-6161.1974.tb00994.x.
classification using deep learning. http://dx.doi.org/10.48550/arXiv.1712.04621, Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional
arXiv:1712.04621. neural networks. In K. Chaudhuri, & R. Salakhutdinov (Eds.), Proceedings of machine
Pritt, M., & Chern, G. (2017). Satellite image classification with deep learning. In 2017 learning research: Vol. 97, Proceedings of the 36th international conference on machine
IEEE applied imagery pattern recognition workshop (pp. 1–7). http://dx.doi.org/10. learning (pp. 6105–6114). PMLR, URL https://proceedings.mlr.press/v97/tan19a.
1109/AIPR.2017.8457969. html.
16
Tan, M., & Le, Q. (2021). EfficientNetV2: Smaller models and faster training. In Yeung, H. W. F., Zhou, M., Chung, Y. Y., Moule, G., Thompson, W., Ouyang, W.,
M. Meila, & T. Zhang (Eds.), Proceedings of machine learning research: vol. 139, Pro- et al. (2022). Deep-learning-based solution for data deficient satellite image seg-
ceedings of the 38th international conference on machine learning (pp. 10096–10106). mentation. Expert Systems with Applications, 191, Article 116210. http://dx.doi.org/
PMLR, URL https://proceedings.mlr.press/v139/tan21a.html. 10.1016/j.eswa.2021.116210, URL https://www.sciencedirect.com/science/article/
Topouzelis, K., Papageorgiou, D., Suaria, G., & Aliani, S. (2021). Floating marine pii/S0957417421015244.
litter detection algorithms and techniques using optical remote sensing data: A Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features
review. Marine Pollution Bulletin, 170, Article 112675. http://dx.doi.org/10.1016/ in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, &
j.marpolbul.2021.112675, URL https://www.sciencedirect.com/science/article/pii/ K. Weinberger (Eds.), vol. 27, Advances in neural information processing systems. Cur-
S0025326X21007098. ran Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2014/
Vanschoren, J. (2018). Meta-learning: A survey. http://dx.doi.org/10.48550/arXiv. file/375c71349b295fbe2dcdca9206f20a06-Paper.pdf.
1810.03548, arXiv preprint arXiv:1810.03548. Zhang, R., Che, T., Ghahramani, Z., Bengio, Y., & Song, Y. (2018). MetaGAN: An
Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial adversarial approach to few-shot learning. In S. Bengio, H. Wallach, H. Larochelle,
Intelligence Review, 18, 77–95. http://dx.doi.org/10.1023/A:1019956318069. K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Vol. 31, Advances in neural infor-
Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2020). Generalizing from a few examples: mation processing systems. Curran Associates, Inc., URL https://proceedings.neurips.
A survey on few-shot learning. ACM Computing Surveys, 53(3), http://dx.doi.org/ cc/paper_files/paper/2018/file/4e4e53aa080247bc31d0eb4e7aeb07a0-Paper.pdf.
10.1145/3386252. Zhang, R., Isola, P., & Efros, A. A. (2016). Colorful image colorization. In B. Leibe,
Wijata, A. M., Foulon, M.-F., Bobichon, Y., Vitulli, R., Celesti, M., Camarero, R., et J. Matas, N. Sebe, & M. Welling (Eds.), Computer vision – ECCV 2016 (pp. 649–666).
al. (2023). Taking artificial intelligence into space through objective selection of Cham: Springer International Publishing.
hyperspectral earth observation applications: To bring the ‘‘brain’’ close to the Zhang, W., Kinoshita, Y., & Kiya, H. (2020). Image-enhancement-based data augmen-
‘‘eyes’’ of satellite missions. IEEE Geoscience and Remote Sensing Magazine, 11(2), tation for improving deep learning in image classification problem. In 2020 IEEE
10–39. http://dx.doi.org/10.1109/MGRS.2023.3269979. international conference on consumer electronics - Taiwan (ICCE-Taiwan) (pp. 1–2).
Xia, W., Zhang, Y., Yang, Y., Xue, J.-H., Zhou, B., & Yang, M.-H. (2023). GAN inversion: http://dx.doi.org/10.1109/ICCE-Taiwan49838.2020.9258292.
A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), Zhang, M., Liao, J., & Yu, J. (2022). Deep Exemplar-Based Color Transfer for 3D
3121–3138. http://dx.doi.org/10.1109/TPAMI.2022.3181070. Model. IEEE Transactions on Visualization & Computer Graphics, 28(08), 2926–2937.
Xiao, X., & Ma, L. (2009). Gradient-preserving color transfer. Computer Graphics Forum, http://dx.doi.org/10.1109/TVCG.2020.3041487.
28(7), 1879–1886. http://dx.doi.org/10.1111/j.1467-8659.2009.01566.x, arXiv: Zhang, X., Wang, C., Tang, Y., Zhou, Z., & Lu, X. (2022). A survey of few-shot learning
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2009.01566.x, URL and its application in industrial object detection tasks. In Y. Wang, K. Martinsen,
https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2009.01566.x. T. Yu, & K. Wang (Eds.), Advanced manufacturing and automation XI (pp. 637–647).
Yang, Y., & Newsam, S. (2010). Bag-of-visual-words and spatial extensions for land- Singapore: Springer Singapore.
use classification. In Proceedings of the 18th SIGSPAtIAL international conference Zhu, J.-Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual
on advances in geographic information systems (pp. 270–279). New York, NY, manipulation on the natural image manifold. In B. Leibe, J. Matas, N. Sebe, &
USA: Association for Computing Machinery, http://dx.doi.org/10.1145/1869790. M. Welling (Eds.), Computer vision – ECCV 2016 (pp. 597–613). Cham: Springer
1869829. International Publishing.
17

Few-shot satellite image classification for bringing deep learning on board OPS-SAT

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Few-shot satellite image classification for bringing deep learning on board OPS-SAT

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 251 (2024) 123984

Contents lists available at ScienceDirect

Expert Systems With Applications

Few-shot satellite image classification for bringing deep learning on board

ARTICLE INFO ABSTRACT

1. Introduction the field of computer vision, achieving remarkable results in various

Abbeel, 2020; Sohl-Dickstein, Weiss, Maheswaranathan, & Ganguli,

by the participants. The primary objectives of the competition are to

3.2. Datasets used in transfer learning

We exploited three datasets in this study: ImageNet (Deng et al.,

We introduce several data-level methodologies to learn from limited

Fig. 7. EfficientNet-B0 building blocks.

Fig. 9. Examples of correct predictions obtained by our model.

Fig. 10. Examples of misclassified patches elaborated by our model.

You might also like