Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

I.

INTRODUCTION
Aircraft accidents are usually fatal and costly. Fatigue,
distraction, negligence, and low visibility conditions are
Power Line Recognition From factors that prevent pilots from noticing hazards, which
results in accidents. A common example of these haz-
Aerial Images With Deep ards are power lines, especially unauthorized ones in ru-
ral areas. Brightly colored aviation marker balls are at-
Learning tached to the cables for visibility, yet they are not installed
ubiquitously.
Our goal is to develop a real-time computer vision
method to recognize the presence of power lines in aerial
images and warn the pilot, From this perspective, the prob-
ÖMER EMRE YETGIN lem can be defined as binary classification of images. On
BURAK BENLIGIRAY the other hand, the academic literature focuses on local-
ÖMER NEZIH GEREK , Senior Member, IEEE
Eskişehir Technical University, Eskişehir, Turkey
izing the power lines and highlighting them on a screen
for the pilot to see. This problem is more difficult, thus,
the solutions will be more prone to error. A robust system
with auditory warning as described in [1] will be preferable
Avoidance of power lines is an important issue of flight safety. over an error-prone system that engages the pilot’s visual
Assistance systems that automatically detect power lines can prevent attention.
accidents caused by pilot unawareness. In this study, we propose us- Recognizing power lines is difficult for machines for the
ing convolutional neural networks (CNN) to recognize the presence of
same reason it is for pilots, they are too thin to be seen from
power lines in aerial images. Deep CNN architectures such as VGG
and ResNet are originally designed to recognize objects in the Im- long distances. Moreover, visibility may be limited due to
ageNet dataset. We show that they are also successful at extracting weather conditions, which causes the contrast between the
features that indicate the presence of power lines, which appear as power lines and the background to be low. To tackle the
simple, yet subtle structures. Another interesting finding is that pre- problem of recognizing the presence of power lines, we
training the CNN with the ImageNet dataset improves power line
propose to use convolutional neural networks (CNNs).
recognition rate significantly. This indicates that the usage of Ima-
geNet pretraining should not be limited to high-level visual tasks, as Recently, deep CNN architectures have been used to
it also develops general-purpose visual skills that apply to more prim- recognize objects in images with great success [2]. A large
itive tasks. To test the proposed methods’ performance, we collected part of this success is due to the sheer size of the Ima-
an aerial dataset and made it publicly available. We experimented geNet dataset [3]. Models pretrained with the ImageNet
with training CNNs in an end-to-end fashion, along with extracting
dataset were used to achieve the best results in other high-
features from the intermediate stages of CNNs and feeding them to
various classifiers. These experiments were repeated with different level recognition tasks, e.g., PASCAL VOC and Caltech
architectures and preprocessing methods, resulting in an expansive 101–256 image classification [4], [5]. This is because the
account of best practices for the usage of CNNs for power line recog- representations learned from the ImageNet dataset are ap-
nition. plicable to tasks of similar domains. However, the effect of
ImageNet pretraining for completely different visual tasks
is unpredictable.
In this study, we investigate the use of CNN architec-
tures designed and trained for ImageNet image classifica-
tion to recognize the presence of power lines in aerial im-
ages. The considered architectures are composed of many
layers to build abstract representations as combinations of
Manuscript received November 7, 2017; revised July 28, 2018; released simpler ones [6]. On the other hand, while being difficult to
for publication November 16, 2018. Date of publication November 28, percept, power lines appear as very simple structures. Ex-
2018; date of current version October 10, 2019. tracting features from earlier stages of the CNNs may yield
DOI. No. 10.1109/TAES.2018.2883879 less abstract features that would be more suitable for our tar-
get task. To test this, we compare two methods. In the first,
Refereeing of this contribution was handled by J. Nichols.
method, end-to-end classification, we partially fine-tune
This work was supported by Eskişehir Technical University Scientific Re- these models for the target task. In the second method, CNN
search Project Commission under Grant 1508F598 and Grant 1608F606. feature classification, we extract features from the interme-
Authors’ address: Ö. E. Yetgin, B. Benligiray, and Ö. N. Gerek are with diate stages of the net, and feed them to various classifiers.
the Department of Electrical and Electronics Engineering, Eskişehir Tech- Our contributions and observations are as follows.
nical University, Eskişehir 26470, Turkey, E-mail: (omeremreyetkin@
gmail.com; bbenligiray@gmail.com; gerek@eskisehir.edu.tr). Corre-
sponding author: Ömer Emre Yetgin.) 1) A publicly available aerial image dataset consisting of
visible light (VL) and infrared (IR) images, to be used
0018-9251 
C 2018 IEEE for power line recognition [7].

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 55, NO. 5 OCTOBER 2019 2241

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
2) CNNs are shown to be viable to be used for recog- Instead of manually designing the edge detector kernels,
nizing power lines, even though they are noncompact Li et al. train a pulse-coupled CNN [21]. Hough transform is
objects. applied to the features detected by this CNN to detect power
3) Despite the extreme domain-shift, ImageNet pretraining lines. Liu et al. used a steerable filter to detect the ridges
is shown to be beneficial for visual tasks with aerial and in the image [22]. Region growing, connected component
IR images. analysis, and line fitting methods are used to obtain power
4) Saliency maps are used to extend the proposed method lines from these ridges. Pan et al. detected edges in small
to provide visual feedback. patches of the image, and feed the edge maps to a CNN
to decide if they contain power lines [23]. Then, Hough
transform is applied to the patches that are determined to
II. RELATED WORK contain power lines. Song and Li apply a morphological
In this section, we discuss the existing methods for rec- filter to first-order derivative of Gaussian images to obtain
ognizing and locating power lines, and the usage of CNNs an edge map [24]. Lines are detected from the edge maps,
for recognition. and validated with a graph-cut model. Zhang et al. utilized
the spatial correlation between power lines and pylons to
validate detections [25]. In all of these studies, the authors
A. Power Line Recognition attempt to find, locate, and verify the power lines. In this
Laser-based sensors have been used for obstacle avoid- paper, we also use CNNs, but not to find and locate power
ance for unmanned aerial vehicles [8]. However, they are lines. Instead, we utilize CNNs to tag the image to “con-
severely affected by weather conditions. Millimeter-wave tain” or “not contain” power lines, which was not attempted
radars are a more robust alternative in this aspect. They are before.
used to detect the unique scattering pattern caused by the
braided structure of the cables [9]. In a related study, Ma
et al. applied Hough transform to detect linear features in B. Convolutional Neural Networks
the millimeter-wave radar images [10]. Since not all detec- The CNN architecture is a combination of a parameter-
tions are power lines, they eliminate erroneous ones with sharing, sparse net for feature extraction, and a fully con-
a support vector machine that classifies their Bragg pattern nected net for classification [26]. Unlike traditional meth-
features. Persistent detections along successive frames are ods for recognition [27], the feature extraction part of CNN
decided to be power lines. is data-dependent. Moreover, the feature extraction and
Power lines appear as line segments on images, which the classification parts are trained jointly. The large image
has directed the majority of the vision-based studies toward datasets emerging in the recent years, such as ImageNet [3],
a similar approach: Edge detection and line fitting, fol- have resulted in the discovery that CNN models scale very
lowed by temporal filtering to remove false positives. Some efficiently with data [2]. Consequently, the ImageNet Large
methods utilize data-dependent modules, but an end-to-end Scale Visual Recognition Challenge (ILSVRC) [28] is dom-
machine learning approach has yet to be proposed. inated by deep CNN models every year [5], [29], [30],
Yetgin et al. demonstrated in [11] and [12] that line de- proving that they constitute the best method for high-level
tection methods such as Hough transform [13], LSD [14], recognition tasks today.
and EDLines [15] can be used to detect power lines with Features extracted from CNNs trained for ILSVRC im-
high speed and reasonable accuracy. Yetgin and Gerek used age classification have been proposed to be used for general-
the lengths and angles of the detected line segments as fea- purpose visual recognition [31], [32]. They yield the state of
tures, and cluster the line segments with a k-means approach the art results in tasks from similar domains such as object
to localize the power lines [16]. Same authors also com- detection [33] and image captioning [34]. However, when
pared the performances of corner and saliency detection the data distribution of the target task is significantly differ-
methods for power line detection [17]. In a more related ent, ImageNet pretraining is avoided. For example, Liu et al.
work, these authors demonstrate that the use of discrete co- used an ImageNet pretrained net to locate the face region in
sine transform (DCT) samples from similar aerial images the images, then use a separate net that is only trained with
provide a fair feature for detecting power lines. However, face images to predict face attributes [35]. Moreover, Im-
the choice of DCT samples was observed to be a challeng- ageNet pretraining is completely omitted if the target task
ing problem [18]. domain is vastly different [36], [37].
Yan et al. detected linear features with an elongated In this paper, we are using aerial images, which are un-
kernel [19]. Then, they detect lines using Radon transform, like images from general-purpose recognition datasets that
and group the ones that have similar parameters. Finally, contain compact objects. Therefore, it is not clear whether
Kalman filter is used to trace the power lines and fill the ImageNet pretraining would be beneficial. Penatti et al.
gaps. Candamo et al. first generate a difference image be- showed that features extracted from an ImageNet pretrained
tween consecutive frames [20]. Then, they detect linear fea- CNN outperforms other feature descriptors for an aerial im-
tures using Canny edge detector, followed by Hough trans- age classification application [38]. Similarly, Hu et al. used
form. The detections are validated if they persist through ImageNet pretrained CNN features to successfully clas-
time. sify remote sensing imagery [39]. To observe the effect of

2242 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 55, NO. 5 OCTOBER 2019

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
method is end-to-end classification. In this method, we start
with a CNN that was designed to be used for ILSVRC image
classification. We replace the final layer of this model with
a randomly initialized softmax layer with two outputs for
binary classification. Then, we train only this final layer
until convergence. Following this, we jointly fine-tune the
feature extraction and classification parts.
The second method utilizes the same CNN as a feature
extractor. We use only the parts up to a certain CNN stage,
and remove the further layers. The output of the partial
CNN is flattened, dimension-reduced with principal com-
ponent analysis (PCA), and fed into a classifier. We train
this classifier separately from the CNN.
We have used ResNet-50 [30] and VGG-19 [5] as the
preferred architectures. These two architectures were cho-
sen because of their relative performances at the ILSVRC
image classification task, where ResNet-50 outperforms
VGG-19. We will be investigating if the performance at
this general visual task is indicative of the performance at
the specific task of power line recognition.
Fig. 1. Examples from the aerial image dataset. The first two columns The main contribution of VGG models was showing that
are IR images with and without power lines, the following two columns a fixed kernel size of 3 × 3 works as good as larger kernel
are VL images with and without power lines. The first two rows are sizes (because earlier architectures were manually tuned
easier examples, based on the metric proposed in [20]. for each layer [2]), and greatly simplifies the architecture
design process. A single VGG-19 net is reported to achieve
ImageNet pretraining to our application, we experiment 7.0% top-5 test error in the ILSVRC image classification
with various CNN architectures that are pretrained or ran- task [5]. For a simpler analysis, we considered the VGG-19
domly initialized. architecture as a composition of five convolutional stages,
as it was implemented for Keras [41] [see Fig. 3(a)].
III. AERIAL IMAGE DATASET Szegedy et al. simplified the architecture design fur-
ther, by first designing inception blocks, and using them to
It is difficult and expensive to generate aerial image build GoogLeNet [29]. Similarly, ResNet is designed as a
datasets. The parties that generate aerial datasets may feel combination of blocks. The main difference of ResNet is
that it is in their commercial or military interest to keep its being much deeper than earlier architectures. This was
them confidential. For these reasons, power line recognition achieved by utilizing batch normalization [42] and residual
methods in the literature tend not to be tested thoroughly. connections [43], both of which help with vanishing or the
In this paper, we present an aerial image dataset that we exploding gradient problem seen in excessively deep mod-
have generated in cooperation with the Turkish Electricity els. A single ResNet-50 net is reported to achieve 5.25%
Transmission Corporation (TEIAS). A helicopter mounted top-5 validation error in the ILSVRC image classification
imaging system was used to capture VL and IR videos from task [30]. Similar to VGG-19, we analyzed this architecture
the air. The video resolutions were 576 × 325 for IR and as a composition of five convolutional stages, based on the
1920 × 1080 for VL. We inspected the videos, manually implementation we have used [41] [see Fig. 3(b)].
selected examples that represent the presence or absence We used three different methods to classify CNN fea-
of power lines, and resized them to 128 × 128 (see Fig. 1). tures. Support vector machines are popular classifiers that
The dataset is composed of 2000 positive examples and aim to obtain the hyperplane that separates the support vec-
4000 negative examples for each domain, respectively. The tors with the largest margin possible [44]. Naive Bayes clas-
videos are captured from 21 different geographical loca- sifiers assume that the classes can be represented as station-
tions in Turkey. The examples are chosen to provide a va- ary Gaussian distributions, where two mean vectors and two
riety of difficulties, due to different backgrounds, lighting, covariance matrices suffice to distinguish two classes using
and weather conditions. Using the method in [20], exam- quadratic hyperplanes. Assuming statistically independent
ples difficulties are graded into two tiers (see Fig. 1). Finally, samples, they maximize the a posteriori probability of an
the dataset is hosted in a public web repository [7] with the observation to reach a decision [45]. Conventional deci-
corresponding localization ground truth [40]. sion trees apply classification by splitting the feature space
into smaller subcategories as long as classification accu-
IV. PROPOSED METHOD racy is not met. Random forests are ensembles of shallow
We propose two alternative methods for the usage of decision trees for improved robustness against overfitting
CNNs for power line recognition (see Fig. 2). The first [46].

YETGIN ET AL.: POWER LINE RECOGNITION FROM AERIAL IMAGES WITH DEEP LEARNING 2243

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Two alternative methods for using CNNs for power line recognition: end-to-end classification and CNN feature classification. In end-to-end
classification, the network is trained jointly. In CNN feature classification, the feature extractor and the classifier are trained sequentially. The
disconnect is represented by dashed arrows.

age of different net architectures and image preprocessing


methods.
The abbreviations used in the tables are as follows:
IR: Infrared SVM: support vector machine
VL: Visible light NB: Naive Bayes
Neg: Negative RF: Random forests
Pos: Positive

A. Implementation Details
For end-to-end classification, the final layer of the CNN
designed for ImageNet classification is replaced with a bi-
nary softmax layer. Being the usual practice, only this layer
is trained first. Then, Stage 5 and the following layers (see
Fig. 3) are fine-tuned jointly. In CNN feature classification,
flattened outputs of convolutional stages are used as CNN
features. Since these features are excessively large, they are
dimension-reduced to size 1024 using PCA.
Keras with TensorFlow backend was used for the
CNN [41], and Weka was used for the classifiers [47]. We
ran all experiments with 10-fold cross-validation. For each
fold, the dataset was segmented as 70% training data, 20%
validation data, and 10% test data. The learning rate for the
CNN final layer was started at 0.1, and halved five times
when the validation loss stopped decreasing. Fine-tuning
learning rate was annealed the same way, but it was initial-
ized at 0.01. Weight decay was set to be 0.001 for all layers.
The classifier parameters were kept as the default values in
Weka 3.8.
Two popular alternatives for image preprocessing were
tested: 1) forcing zero-mean by subtracting the dataset av-
Fig. 3. CNN architectures used in the study, VGG-19 [5] and erage (referred to as mean subtraction), and 2) linear nor-
ResNet-50 [30]. The stages of the architectures are illustrated in different
malization to 0–1 scale by dividing by 255.
colors. Note that the convolutional and identity blocks of ResNet-50 are
composed of multiple layers. (a) VGG-19. (b) ResNet-50.
B. End-to-End Classification
We start our experiments with CNNs pretrained for
ILSVRC image classification (from now on to be referred
V. EXPERIMENTAL RESULTS as ImageNet pretrained nets). These nets include filters such
In this section, we present experimental results for the as edge and blob detectors in the earlier layers, which are
two proposed methods: end-to-end classification and CNN likely to be useful in a majority of visual tasks. However,
feature classification. Additionally, we investigate the us- they also come with a redundant specialization in recog-

2244 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 55, NO. 5 OCTOBER 2019

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
TABLE I TABLE II
Classification Errors in Percentages for the End-to-End Confusion Matrices for the End-to-End Classification Method
Classification Method (Trained Final Layer and Fine-Tuned Stage 5, ImageNet
Pretraining, Mean Subtraction Preprocessing)

Rows are ground truths, and columns are predictions.

CNN architectures are inherently frequency selective.


For this reason, they are sensitive to Gabor wavelet-
like structures, even when they are composed of random
weights [48]. Then, it is reasonable to expect untrained
nets to perform reasonably well in power line recognition,
as they do in other tasks [49]. In other words, ImageNet
Boldface font indicates the best results.
pretraining may not have meaningful effect in achieving
the performance in Table I(a) and (b). To test the effect
of ImageNet pretraining, we generated nets with random
nizing objects [4]. Therefore, it is not predictable if they
will work well for the task at hand. Table I(a) and (b) shows weights using Xavier initialization [50], and repeated the
experiments (see Table I(c) for the results). It is clear that
the results with ImageNet pretrained models, and Table I(c)
ImageNet pretraining was significantly beneficial for power
shows the results with a randomly initialized model.
As discussed earlier, we train the randomly initialized line recognition, even when the images were from the IR
spectrum. However, untrained nets were also able to deliver
last layer individually, then fine-tune the net. The entire
better than random predictions. See Fig. 4 for the receiv-
net has millions of parameters that can be fine-tuned. The
number of free parameters increase the expressive power ing operator characteristic curves of the experiments in this
section.
of the model. However, if there is not much training data,
this also causes the model to overfit. Since our training
set is relatively small, we must limit the number of free C. Classifying CNN Features
parameters in the model. For this reason, we limited the In this section, we investigate using the CNN as a fea-
fine-tuning to only the final stage (Stage 5 in Fig. 3) and ture extractor, and using a variety of classifiers. The CNNs
the following layers. In Table I(a)–(c), the results are given learn fundamental representations in the earlier layers, such
both for training the last layer, and fine-tuning the last stage. as directed edges, blobs, and patterns. As we move on
See Table I(a) for the results where the ImageNet pre- to higher layers, representations grow abstract and data-
trained nets are fed mean subtracted images. We can see specific. Considering that the data distribution of the Im-
that just by training the last layer, we achieve the consid- ageNet dataset and our aerial images dataset is vastly dif-
erable performance with features learned from ImageNet ferent, the more abstract representations from the higher
pretraining. ResNet-50 performs significantly better than layers may not be ideal for us. To test this hypothesis, we
VGG-19. Following this, we fine-tune the final stage of the extracted features from different stages of the CNNs, and
nets. Here, we see that the ResNet-50 performance is im- fed them into classifiers.
proved even further, resulting in the best performance that See Table III(a) and (b) for the CNN feature classifica-
will be reported in this study. On the other hand, the IR tion results with mean subtraction preprocessing. It is clear
performance of VGG-19 is not improved. See Table II for that similar to the end-to-end classification results, ResNet-
the confusion matrices where Stage 5 is fine-tuned. 50 outperformed VGG-19. Again, in accordance with the
We repeat the previous experiment by using 0–1 nor- earlier results, IR images are classified more easily. If we
malization instead of mean subtraction as the preprocess- compare the classifiers, SVM yields the best classification
ing method. See the results in Table I(b) and compare with performance across all conditions. See Table IV for the
Table I(a). ImageNet pretraining is done with mean subtrac- confusion matrices, and Fig. 5 for the receiving operator
tion. This dynamic range alteration results in the inferior characteristic curves with the best configurations in this
performance. What is interesting is that once we fine-tune section.
Stage 5, the performance improves dramatically, and be- Let us compare the performances of features extracted
comes comparable to using mean subtraction. from different stages. Nearly in all cases where naive Bayes

YETGIN ET AL.: POWER LINE RECOGNITION FROM AERIAL IMAGES WITH DEEP LEARNING 2245

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Receiver operating characteristic curves of the end-to-end classification method. The legend is in the order of decreasing area under the curve.
Note that the curves overlap and occlude each other in some figures. (a) IR, ResNet-50. (b) IR, VGG-19. (c) VL, ResNet-50. (d) VL, VGG-19.

TABLE III
Classification Errors in Percentages for the CNN Feature Classification Method With Mean Subtraction Preprocessing

Boldface font indicates the best results.

TABLE IV TABLE V
Confusion Matrices for the Best CNN Feature Classification Cumulative Running Times of the CNN Models for a Single Image in
Method (SVM Classifier, ResNet-50 Architecture, Stage 4 Milliseconds
Features for IR, and Stage 3 Feature for VL Images, Mean
Subtraction Preprocessing)

Rows are ground truths, and columns are predictions.

D. Running Time
An effective power line warning system should warn
pilots at a distance that would be adequate to take the neces-
sary risk-avoidance maneuvers. In this respect, the running
time of the algorithm is an important factor, as the heli-
copter may be approaching the hazard during the running
time.
The running times given in Table V are obtained with an
Nvidia GTX 1080 GPU. The best performing configuration,
end-to-end classification with ResNet-50, runs in 21.7 ms,
Fig. 5. Receiver operating characteristic curves of the CNN feature which is reasonable for a real time-application. Running
classification method. ImageNet pretrained ResNet-50 model is used times for PCA projection and classification with SVM, NB,
with mean subtraction preprocessing. Stage 4 for IR and Stage 3 for VL RF are not significant (< 0.5 ms) and, thus, are omitted.
are shown because they delivered the best results [(see Table III(b)]. (a)
IR, Features from Stage 4. (b) VL, Features from Stage 3.
Note that it is possible to optimize the method for a lower
running time by using lightweight architectures such as
MobileNets [51], pruning existing architectures [52], or
or random forest classifiers are used, Stage 3 features using specialized hardware [53].
yielded the best results. In contrast, the SVM performance
improves with higher-level features. The best performances E. Line Detection Based Methods
in Table III(a) and (b) are comparable to best performances There is a large body of work that is focused on power
in Table I(a) without fine-tuning, which was equivalent to line detection, as discussed in Section II. These methods
extracting features from Stage 5 and classifying them with largely depend on detecting linear features in the scene, and
a single layer net. forming collinear groups. We have argued that for a power

2246 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 55, NO. 5 OCTOBER 2019

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. First row shows negative examples from both spectra, which were classified correctly by the end-to-end method. The following three rows
show line segments detected by EDLines [15], LSD [14] and a Hough transform based method [13], respectively.

line avoidance warning system, recognizing if a scene con- a natural linear structure that was likely mistaken for a
tains power lines is adequate. In this section, we are going power line, and the line segment detection methods were
to demonstrate the performance of line segment detection also susceptible to these structures.
methods in recognizing the presence of power lines.
First, we discuss the case in which there are no power F. Comparison of Methods
lines in the scene. Fig. 6 shows negative IR and VL exam-
In this section, we present a quantitative comparison
ples, which the proposed end-to-end method has classified
of the best performances of various methods on respec-
correctly. We can see that there are certain linear structures
tive datasets (see Table VI). We can see that the results are
in the scene. Both EDLines [15] and LSD [14] capture
consistent for both spectra. End-to-end classification with
these structures as a single long line segment or short, yet
fine-tuning gives the best results, and using the CNN as a
collinear line segments. In addition, a number of line seg-
feature extractor gives comparable results. We compared
ments emerge from noisy areas, of which some happen to
these results with a recent method that uses DCT features
be collinear. Since such collinear features occur sponta-
for classification [18]. This method is applied with vari-
neously, methods that rely solely on linear features to rec-
ous parameters and classifier types (SVM, RF, or NB), and
ognize the presence of power lines in an image are going to
the best results are given in Table VI. It is clear that the
be prone to error. Hough transform based methods [13] are
proposed application of deep learning on power line recog-
even more problematic, because they have to assume that a
nition provides a significant improvement.
certain number of lines are present in the image beforehand
(chosen to be 1 in Fig. 6). Therefore, they are completely
VI. VISUALIZATION
inapplicable to our problem.
Next, we discuss how line segment detection methods We have presented the performance of the proposed
fare in the examples that the proposed method have failed. method with quantitative experiments. In this section, we
Fig. 7 shows some of the examples, where the proposed end- are going to analyze the proposed method [specifically, the
to-end method has failed to classify correctly. In the four model from Table I(a)] further, using various visualization
examples at the left side, power lines are present, yet not techniques. Let us start by investigating the saliency maps
recognized by the proposed method. Since the power lines for positive examples. A saliency map for an image is ob-
appear very weakly, they were not consistently captured by tained by backpropagating the gradient to the respective im-
the line segment detection methods, either. The background age. For this purpose, we used guided backpropagation [54],
has also produced collinear features, which suppresses the where only positive ReLU activations propagate gradients.
features from the power lines. In the four examples at the This method is particularly suitable for the task at hand,
right side, power lines are not present, yet the proposed because it emphasizes object contours, rather than its body.
method falsely recognizes them. These examples contain See Fig. 8 for input images and corresponding saliency

YETGIN ET AL.: POWER LINE RECOGNITION FROM AERIAL IMAGES WITH DEEP LEARNING 2247

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
Fig. 7. First row shows four negative and four positive examples, which were classified incorrectly by the end-to-end method. Ground truth is
presented in the second row [40]. The following three rows show line segments detected by EDLines [15], LSD [14] and a Hough transform based
method [13], respectively.

maps. The proposed method is sensitive to power lines that TABLE VI


appear as curves, which implies superiority over methods Best Performing Configurations With Each Proposed Method at IR and
VL Images
that assume collinearity. Another interesting point is that
the model has learned to recognize pylons, and use them
as evidence of power lines. Again, a hand-crafted method
based on line detection would be unable to do this. The
linear structures of the buildings can be seen to be ignored,
which is desirable. We can also evaluate the dataset based
on these saliency maps. Specifically, the fact that the model
focuses on the power lines rather than the background im-
plies that the dataset is not biased in the way that positive
examples share a similar background.
We can also use the saliency map to provide visual feed-
back to the pilot, as proposed in [55]. See Fig. 8 for exam-
ples. To achieve this, we thresholded the saliency map using
Otsu’s method [56], applied Gaussian blur, and superposed
it over the original image with a colormap. The saliency
map is obtained with a single backward pass, meaning that
the extra computation is comparable to what is required for Boldface font indicates the best results.
the proposed method. Therefore, our proposed method can
be extended to provide visual feedback and still work in that the representations grow higher in level through Stage
real time. 1–4. However, due to the fine-tuning that we apply, the
To further analyze how the proposed method functions, representations regress to a much lower level in Stage 5.
let us visualize the activations. This is done by choosing a Then, a combination of these representations are used to
single neuron output from a stage, and optimizing for the recognize power lines in the final stage, as evident from the
highest activation. See the results in Fig. 9. Since this model parallel linearities and pylon-like structures. We can also
was initialized by ImageNet pretraining, we can observe say that the final stage has learned to handle power line

2248 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 55, NO. 5 OCTOBER 2019

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
Fig. 8. First row shows input images, second row shows saliency maps obtained by guided backpropagation [54] (darker color indicates higher
saliency), and third row shows a visualization technique that can provide visual feedback for the pilot.

the ImageNet dataset, and its last stage was fine-tuned with
power line images. In nearly all experiments, IR images
were classified more successfully. This shows that it is
preferable to use IR imaging for a power line warning sys-
tem. However, the performance with VL images was also
reasonable.
Overall, ResNet-50 performed better than VGG-19,
which implies that a model’s ImageNet performance tends
to be indicative of its performance at other tasks. This would
mean that the results we have achieved can be improved by
replacing the pretrained network with a newer one that
performs better at high-level visual tasks. In addition, we
show that ImageNet pretraining is significantly beneficial
for power line recognition, which is consistent with recent
findings in other domains [57].
Fig. 9. Activation visualizations for the model in Table I(a). The images
Contrary to our initial expectation, higher-level features
are zoomed ×2 for better viewing. The model was initialized by yielded better performance with the CNN feature classifi-
ImageNet pretraining, then Stage 5 and Final Stage were fine-tuned for cation method. This indicates that, while the representa-
power line recognition. tions in the final layers are being optimized to disentangle
higher-level factors of variation (i.e., what object does an
image contain), they also become better at disentangling
appearances of any orientation, based on the variety in the
lower-level factors of variation. Therefore, high-level fea-
figure.
tures from pretrained nets are concluded to be beneficial
in a wide variety of visual tasks. End-to-end classification
VII. CONCLUSION surpassed CNN feature classification, because it uses high-
In this study, we proposed two CNN-based power line level features and allows fine-tuning.
recognition methods to be used in a real-time warning sys- In the experiments without fine-tuning, we observed
tem. Unlike previous methods, where the cable lines were that the preprocessing method was critical. Specifically,
localized, we consider the problem as a binary classification one should use the preprocessing method that was used
where the scene contains a power line, or not. Both of the in pretraining, which was mean subtraction in our case.
proposed methods use CNNs designed for ImageNet object However, fine-tuning nullifies the effect of the difference
recognition. In the first method, end-to-end classification, between the preprocessing methods used in pretraining and
the CNN is modified for the target task, and trained jointly. training.
In the second method, CNN feature classification, features Finally, we showed that even though the architectures
are extracted from the intermediate stages of the CNN, and we used were designed for ImageNet object recognition,
fed into a classifier. they performed well at the target task. Aerial images of VL
The best results were obtained with end-to-end clas- and IR spectra constitute a considerably different domain
sification, where a ResNet-50 model was pretrained with than the ImageNet dataset. Yet, ImageNet pretraining was

YETGIN ET AL.: POWER LINE RECOGNITION FROM AERIAL IMAGES WITH DEEP LEARNING 2249

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
observed to affect the experimental results positively. More- [13] R. O. Duda and P. E. Hart
over, the architecture that performs better at ImageNet ob- Use of the Hough transformation to detect lines and curves in
ject classification also performs better at power line recog- pictures
Commun. ACM, vol. 15, no. 1, pp. 11–15, 1972.
nition, regardless of if they are pretrained or not. Consider- [14] R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall
ing that the better performance at a general purpose visual LSD: A fast line segment detector with a false detection
task results in a better performance at this specific visual control
task, the premise of a unified net for all visual tasks looks IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 4, pp. 722–
promising. 732, Apr. 2010.
[15] C. Akinlar and C. Topal
EDLines: A real-time line segment detector with a false detec-
ACKNOWLEDGMENT tion control
The authors would like to thank Turkish Electricity Trans- Pattern Recognit. Lett., vol. 32, no. 13, pp. 1633–1642,
2011.
mission Company for providing the power line videos. [16] Ö. E. Yetgin and Ö. N. Gerek
PLD: Power line detection system for aircrafts
In Proc. IEEE Int. Artif. Intell. Data Process. Symp., 2017,
REFERENCES pp. 1–5.
[17] Ö. E. Yetgin and Ö. N. Gerek
[1] L. M. Greene and R. A. Greene
A comparison of corner and saliency detection methods for
Pilot’s aid for detecting power lines
power line detection
U.S. Patent 6 002 348, Dec. 14, 1999.
In Proc. IEEE Int. Artif. Intell. Data Process. Symp., 2017,
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton
pp. 1–5.
ImageNet classification with deep convolutional neural net-
[18] Ö. E. Yetgin and Ö. N. Gerek
works
Automatic recognition of scenes with power line wires in real
In Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
life aerial images using DCT-based features
[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei
Digit. Signal Process., vol. 77, pp. 102–119, 2017.
ImageNet: A large-scale hierarchical image database
[19] G. Yan, C. Li, G. Zhou, W. Zhang, and X. Li
In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009,
Automatic extraction of power lines from aerial images
pp. 248–255.
IEEE Geosci. Remote Sens. Lett., vol. 4, no. 3, pp. 387–391,
[4] M. D. Zeiler and R. Fergus
Jul. 2007.
Visualizing and understanding convolutional networks
[20] J. Candamo, R. Kasturi, D. Goldgof, and S. Sarkar
In Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.
Detection of thin lines using low-quality video from low-
[5] K. Simonyan and A. Zisserman
altitude aircraft in urban settings
Very deep convolutional networks for large-scale image recog-
IEEE Trans. Aerosp. Electron. Syst., vol. 45, no. 3, pp. 937–
nition
949, Jul. 2009.
CoRR, vol. abs/1409.1556, 2014. [Online]. Available: http://
[21] Z. Li, Y. Liu, R. Walker, R. Hayward, and J. Zhang
arxiv.org/abs/1409.1556
Towards automatic power line detection for a UAV surveil-
[6] Y. Bengio, A. Courville, and P. Vincent
lance system using pulse coupled neural filter and an improved
Representation learning: A review and new perspectives
Hough transform
IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8,
Mach. Vis. Appl., vol. 21, no. 5, pp. 677–686, 2010.
pp. 1798–1828, Aug. 2013.
[22] Y. Liu, L. Mejias, and Z. Li
[7] Ö. E. Yetgin and Ö. N. Gerek
Fast power line detection and localization using steerable filter
Powerline Image Dataset Extra (Infrared-IR and Visible Light-
for active UAV guidance
VL)-Classified (Easy and Hard)
Int. Arch. Photogrammetry, Remote Sens. Spatial Inf. Sci.,
2017. [Online]. Available: https://data.mendeley.com/datasets/
vol. XXXIX-B3, pp. 491–496, 2012.
n6wrv4ry6v/7
[23] C. Pan, X. Cao, and D. Wu
[8] R. Sabatini, A. Gardi, and M. Richardson
Power line detection via background noise removal
LIDAR obstacle warning and avoidance system for unmanned
In Proc. IEEE Global Conf. Signal Inf. Process., 2016, pp. 871–
aircraft
875.
Int. J. Mech., Aerosp. Ind. Mechatron. Eng., vol. 8, no. 4,
[24] B. Song and X. Li
pp. 718–729, 2014.
Power line detection from optical images
[9] K. Sarabandi, L. Pierce, Y. Oh, and F. T. Ulaby
Neurocomputing, vol. 129, pp. 350–361, 2014.
Power lines: Radar measurements and detection algorithm for
[25] J. Zhang, H. Shan, X. Cao, P. Yan, and X. Li
SAR images
Pylon line spatial correlation assisted transmission line detec-
IEEE Trans. Aerosp. Electron. Syst., vol. 30, no. 2, pp. 632–
tion
643, Apr. 1994.
IEEE Trans. Aerosp. Electron. Syst., vol. 50, no. 4, pp. 2890–
[10] Q. Ma, D. S. Goshi, Y.-C. Shih, and M.-T. Sun
2905, Oct. 2014.
An algorithm for power line detection and warning based on a
[26] Y. LeCun et al.,
millimeter-wave radar video
Backpropagation applied to handwritten zip code recognition
IEEE Trans. Image Process., vol. 20, no. 12, pp. 3534–3543,
Neural Comput., vol. 1, no. 4, pp. 541–551, 1989.
Dec. 2011.
[27] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray
[11] Ö. E. Yetgin, Z. Şentürk, and Ö. N. Gerek
Visual categorization with bags of keypoints
A comparison of line detection methods for power line avoid-
In Proc. Workshop Statist. Learn. Comput. Vis. Eur. Conf. Com-
ance in aircrafts
put. Vis., 2004, pp. 1–2.
In Proc. Int. Conf. Elect. Electron. Eng., 2015, pp. 241–245.
[28] O. Russakovsky et al.,
[12] Ö. E. Yetgin and Ö. N. Gerek
ImageNet large scale visual recognition challenge
Cable and wire detection system for aircrafts
Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252,
In Proc. IEEE Signal Process. Commun. Appl. Conf., 2013,
2015.
pp. 1–4.

2250 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 55, NO. 5 OCTOBER 2019

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
[29] C. Szegedy et al., [44] C. Cortes and V. Vapnik
Going deeper with convolutions Support-vector networks
In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
pp. 1–9. [45] G. H. John and P. Langley
[30] K. He, X. Zhang, S. Ren, and J. Sun Estimating continuous distributions in Bayesian classifiers
Deep residual learning for image recognition In Proc. Conf. Uncertain. Artif. Intell., 1995, pp. 338–345.
In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, [46] L. Breiman
pp. 770–778. Random forests
[31] J. Donahue et al., Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
DeCAF: A deep convolutional activation feature for generic [47] E. Frank, M. A. Hall, and I. H. Witten
visual recognition The WEKA Workbench. Online Appendix for ‘Data Mining:
In Proc. Int. Conf. Mach. Learn., 2014, pp. 647–655. Practical Machine Learning Tools and Techniques,’ 4th ed.
[32] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson San Mateo, CA, USA: Morgan Kaufmann, 2016.
CNN features off-the-shelf: An astounding baseline for recog- [Online]. Available: http://www.cs.waikato.ac.nz/ml/weka/
nition Witten_et_al_2016_appendix.pdf
In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, [48] A. Saxe, P. W. Koh, Z. Chen, M. Bhand, B. Suresh, and A. Y. Ng
2014, pp. 806–813. On random weights and unsupervised feature learning
[33] R. Girshick, J. Donahue, T. Darrell, and J. Malik In Proc. Int. Conf. Mach. Learn., 2011, pp. 1089–1096.
Rich feature hierarchies for accurate object detection and se- [49] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun
mantic segmentation What is the best multi-stage architecture for object recognition?
In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009,
pp. 580–587. pp. 2146–2153.
[34] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan [50] X. Glorot and Y. Bengio
Show and tell: A neural image caption generator Understanding the difficulty of training deep feedforward neu-
In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, ral networks
pp. 3156–3164. In Proc. Int. Conf. Artif. Intell. Statist., 2010, pp. 249–256.
[35] Z. Liu, P. Luo, X. Wang, and X. Tang [51] A. G. Howard, et al.
Deep learning face attributes in the wild MobileNets: Efficient convolutional neural networks for mo-
In Proc. Int. Conf. Comput. Vis., 2015, pp. 3730–3738. bile vision applications
[36] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li CoRR, vol. abs/1704.04861, 2017. [Online]. Available: http://
Deep convolutional neural networks for hyperspectral image arxiv.org/abs/1704.04861
classification [52] S. Han, H. Mao, and W. J. Dally
J. Sensors, vol. 2015, 2015, Art. no. 258619. Deep compression: Compressing deep neural network with
[37] Z. Gao, L. Wang, L. Zhou, and J. Zhang pruning, trained quantization and Huffman coding
HEp-2 cell image classification with deep convolutional neural CoRR, vol. abs/1510.00149, 2015. [Online]. Available:
networks http://arxiv.org/abs/1510.00149
IEEE J. Biomed. Health Informat., vol. 21, no. 2, pp. 416–428, [53] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong
Mar. 2017. Optimizing FPGA-based accelerator design for deep convolu-
[38] O. A. Penatti, K. Nogueira, and J. A. dos Santos tional neural networks
Do deep features generalize from everyday objects to remote In Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate
sensing and aerial scenes domains? Arrays, 2015, pp. 161–170.
In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, [54] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller
pp. 44–51. Striving for simplicity: The all convolutional net
[39] F. Hu, G.-S. Xia, J. Hu, and L. Zhang In Proc. Int. Conf. Learn. Representations Workshop Track,
Transferring deep convolutional neural networks for the scene 2015, pp. 1–14.
classification of high-resolution remote sensing imagery [55] B. Benligiray and Ö. N. Gerek
Remote Sens., vol. 7, no. 11, pp. 14680–14707, 2015. Visualization of power lines recognized in aerial images using
[40] Ö. E. Yetgin and Ö. N. Gerek deep learning
Ground Truth of Powerline Dataset (Infrared-IR and Visible In Proc. IEEE Signal Process. Commun. Appl. Conf., 2018,
Light-VL) pp. 1–4.
2017. [Online]. Available: https://data.mendeley.com/datasets/ [56] N. Otsu
twxp8xccsw/8 A threshold selection method from gray-level histograms
[41] F. Chollet et al., IEEE Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, pp. 62–66,
Keras Jan. 1979.
2015. [Online]. Available: https://github.com/fchollet/keras [57] N. Tajbakhsh, et al.
[42] S. Ioffe and C. Szegedy Convolutional neural networks for medical image analysis:
Batch normalization: Accelerating deep network training by Full training or fine tuning?
reducing internal covariate shift IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1299–1312, May
In Proc. Int. Conf. Mach. Learn., 2015, pp. 448–456. 2016.
[43] K. He, X. Zhang, S. Ren, and J. Sun
Identity mappings in deep residual networks
In Proc. Eur. Conf. Comput. Vis., 2016, pp. 630–645.

YETGIN ET AL.: POWER LINE RECOGNITION FROM AERIAL IMAGES WITH DEEP LEARNING 2251

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.
Ömer Emre Yetgin received the B.Sc. degree in electronics engineering from Erciyes
University, Kayseri, Turkey, and the M.Sc. degree in electronic-computer engineering
from the Gazi University, Ankara, Turkey. He is currently working toward the Ph.D.
degree in electrical and electronics engineering from the Eskişehir Technical University,
Eskişehir, Turkey.
His research interests include image processing, pattern recognition, machine learning,
and signal processing.

Burak Benligiray received the B.Sc. degree in electrical and electronics engineering from
Middle East Technical University, Ankara, Turkey, in 2011, and the M.Sc. degree in elec-
trical and electronics engineering from the Anadolu University, Eskişehir, Turkey, in 2014.
He is currently working toward the Ph.D. degree in electrical and electronics engineering
at Eskişehir Technical University, Eskişehir, Turkey, working on self-supervised training
of CNNs.
He was a Researcher for a state-funded project between 2011 and 2013, and has been
a Research Assistant since 2014. His research interests include deep learning, computer
vision, and smart contract oracles.

Ömer Nezih Gerek (SM’07) received the Ph.D. degree in electrical and electronics
engineering from Bilkent University, Ankara, Turkey, in 1998.
During his Ph.D., he spent a semester at the University of Minnesota as a Researcher.
From 1998 to 1999, he was a Technical Researcher with the Swiss Federal Institute of
Technology, Lausanne, Switzerland. He is currently a Full Professor with the Department of
Electrical and Electronics Engineering, Eskişehir Technical University, Eskisehir, Turkey.
His research interests include signal and image processing and analysis.
Prof. Gerek is an IEEE senior member and a member of TUBITAK (Turkish Scientific
and Technological Research Council) EE-CS management committee. He is serving in the
editorial boards of Elsevier: DSP and TUBITAK: TJEECS.

2252 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 55, NO. 5 OCTOBER 2019

Authorized licensed use limited to: Cornell University Library. Downloaded on September 01,2020 at 09:08:44 UTC from IEEE Xplore. Restrictions apply.

You might also like