Comic Characters Detection Using Deep Learning

2017 14th IAPR International Conference on Document Analysis and Recognition
Comic characters detection using deep learning

Nhu-Van NGUYEN, Christophe RIGAUD, Jean-Christophe BURIE
L3i laboratory, University of Rochelle, La Rochelle, France
Email: {nhu-van.nguyen, christophe.rigaud, jcburie}@univ-lr.fr
Abstract—Comic characters detection has been an interesting coherent [9].

area in comic analysis as it not only allows more efficient
indexation and retrieval for comic books but also yields an
adequate understanding of comics so as to help better creating
the digital form of comic books. In recent years, several methods
that have been proposed to extract/detect characters from
comics, have given reasonable performance. However, they
always use their datasets to evaluate the methods without
comparing with other works or experimenting on a standard
dataset. In this work, we take advantage of the recent and
significant development of deep learning to apply it to comic
character detection. We use the latest object detection deep
networks to train the comic characters detector based on our
proposed dataset. By experimenting on our proposed dataset and
also on available datasets from previous works, we have found
that this method significantly outperforms existing methods. We Fig. 1. Examples of comic characters with many variants regarding
believe that this state-of-the-art approach can be considered as a shapes, colors, stroke styles and textures. Image from eBDtheque [8].
reliable baseline method to compare and better understand Best viewed in color.
future detection techniques. In this work, we focus on task of comic character
detection. Among primary tasks of comics analysis, comics
I. INTRODUCTION characters detection is the most difficult part. Balloons, panels
Since the 19th century, a graphic art called “comic books” and texts are semi-structured components. They follow few
is used to tell a story. Comic books are made by combining conventions widely adopted by comic book’s authors to avoid
text and graphic. Initially, comics was printed on paper, but confusing the reader [13, 14]. However, the authors are
nowadays digital comic books become more popular and are entirely free in their drawing of comic characters. Comic
produced and reused in many projects, such as The Digital characters detection is also different from human detection
Comic Museum1, Cité internationale de la bande dessinée et even through many comics are reproductions of human life
de l’image2 in France and the Grand Comics Database3, DC situations. Comic characters are hand drawn and therefore
Comics4 from Warner Bros, and comiXology5 recently much more variant regarding deformation, shape and
acquired by Amazon.com [9]. appearance than real life humans [10, 11, 12]. Hence, we can
not directly apply human detection based methods on comics.
Digital comic content is produced mainly to facilitate
This difficulty is one of the reasons why there are very few
reading on screens of devices such as computers, tablets or
works on comic character detection, including its variants such
mobile phones. However, to enhance traditional collections,
as comic face detection or comic character retrieval.
we need to transform paper comics into a new digital form
adapted to the medium in which the comic is read (e.g. In [1, 2], the authors have proved that Viola-Jones
smartphone, web page, 3D book). Digitized comic book detection framework [3] is sufficient for detecting faces in
analysis can then be investigated to help the creation of comics mangas (Japanese comics). However, in [4] the authors show
in a digital form or to enhance user experience. This field of that prior techniques for face detection and face recognition
research includes components extraction and relations analysis for real people’s faces (including [3]) can hardly be applied to
[9], style analysis [32] and information retrieval [7, 17]. While colored comics characters because comics characters faces
information retrieval research domain has a long history, the differ from real people faces considerably in respect of organ
creation of comics in a digital form is quite new. Component positions, sizes and color shades. They proposed another face
extraction (e.g. panel, balloon, text, comic characters) and detection method using skin color regions and edges. In [5],
their relations (e.g. read before, said by, thought by, addressed the authors profit from color attributes to boost object
to) are necessary to reconstruct the story by creating the links detection task, especially for comic character detection.
between elements based on their initial order to keep the story
Another approach using graph theory have been proposed
1 by [6]. The authors have detected the comic characters by
http://digitalcomicmuseum.com/ representing each panel as an attributed adjacency graph in
2
http://collections.citebd.org/ which color regions are used as nodes. With a similar idea, the
3
http://www.comics.org/ work in [7] uses SIFT descriptor with redundant information
4
http://www.dccomics.com/ classification to also find the most repeated elements.
5
http://www.comixology.com/
2379-2140/17 $31.00 © 2017 IEEE 41

DOI 10.1109/ICDAR.2017.290
Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:40:39 UTC from IEEE Xplore. Restrictions apply.
Some other works focus on character retrieval. The work B. Convolutional neural networks for object detection
in [15] shows good results for character retrieval using local In our work, we are interested in object detection task
feature extraction and the approximate nearest neighbors using deep learning: given an individual image, we would like
(ANN) search. In [16], authors use Frequent Subgraph Mining to be able to delimit with a bounding box all the objects
(FSM) techniques for comic image browsing using query-by- (comic characters). The standard approach to do the object
example (QBE) model. In [17], authors propose a manga- detection is to use a CNN classifier on different sub-windows
specific image-describing framework. It consists of efficient or regions extracted from the image. After the classification,
margin labeling, edge orientation histogram feature the CNNs will adjust the object localization (bounding boxes)
description and approximate nearest-neighbor search using by minimizing the error of predicted localization against the
product quantization. ground truth localizations.
In previous works, most of the authors focus on designing Today, most of the researches focus on methodologies of
handcrafted features to represent comic characters. The regions proposal which aims at reducing the number of
extracted features are used to detect or retrieve characters from regions to classify [23, 27, 33] and improving the localization
comic pages or panels. Although the existing methods give accuracy. The simplest technique is to run classification on all
reasonable performances, they always use their datasets to the regions formed by sliding different sized windows all
evaluate the methods without comparing with other works or through the image [22], but there are a huge number of regions
experimenting on a standard dataset. to classify. In [23] author uses a region proposal algorithm like
In this paper, we present a new approach based on Deep selective search [26] which generates about 2000 potential
Neural Networks, thanks to its recent significant development. regions. In [27] the authors propose a spatial pyramid pooling
We also propose a new dataset including ground-truth for (SPP) technique which allows to reduce the computation time.
comic character. We show that with a thorough training on the In [28], the Fast-RCNN method improves SSP by combining
proposed dataset, we can outperform the current state-of-the- classification network and localization network which reduces
art comic character detection, even on other datasets without the overall training time and increases the accuracy. Recently,
the need tore-train. A comparison on available datasets from Faster-RCNN approach uses region proposal network (RPN)
previous works is presented. This state-of-the-art comic to achieve the state-of-the-art performance in the field [30].
characters detection could be considered as a reliable baseline Lastly, at the beginning of 2017, another method named
method for developing future detection techniques. YOLOv2 have been introduced. In this method, object
detection is treated as a single regression problem, straight
The rest of the paper is structured as follows. In the next from image pixels to bounding box coordinates and class
section, an overview of deep learning object detection is probabilities [24]. This method has the state-of-the-art
presented. The proposed dataset and three available datasets accuracy and it is faster than Faster-RCN.
are detailed in section III. The experiment result and
discussion are given in section IV. We have trained the two state-of-the-art deep neural
networks: Faster RCNN and YOLOv2 with our Sequencity612
II. DEEP LEARNING AND COMICS CHARACTER DETECTION dataset. In our experiment, YOLOv2 model is slightly better in
term of accuracy. We have learned YOLOv2 model from
A. Deep neural networks scratch and also refined the Darknet’s model. We have kept
Neural networks in general and convolutional neural the later because it gives better results.
networks for computer vision have been introduced for a long
time but they are really well developed recently since a C. YOLOv2 model
breakthrough paper in 2012 [18]. The network model in [18] YOLOv2 [24] is the improved version of the model
wins the 2012 ImageNet competition, with a significant YOLOv1 of the same author. The idea of YOLO is to divide
improvement on image classification about 11% (error rate the image into a grid of SxS cells. For each cell, the YOLOv1
from 26% to 15%), an immense improvement at that time. model will predict B bounding boxes. Each bounding box
Lately, deep neural networks were well studied in many big is characterized by the following parameters that have to be
companies and labs all over the world in different domains. optimized: where (x,y) is the center of
The most popular use case of these networks is for computer , (w, h) is width and height of , is the probability that
vision.
an object has its center located inside the cell, is the
However, the complicated deep learning models and its conditional probability of the class c for this , which
underlying working processes are still difficult to understand is the set of defined classes to detect. While YOLOv1 predicts
for machine learning experts [20]. While it is the coordinates of bounding boxes directly using fully
incomprehensible of when and why a deep model works [21, connected layers on top of the convolutional feature extractor,
31], the development of recent deep models typically relies on the YOLOv2 predicts bounding boxes using hand-picked
a substantial amount of trial-and-error [19, 20, 21]. Deep priors as introduced in Faster-RCNN[30]. But instead of
CNNs are still well developed and have significant predicting offsets of bounding boxes as Faster-RCNN,
achievements in most computer vision tasks such as image YOLOv2 predicts location coordinates relatively to the
classification, detection, localization and more. To our location of the grid cell as YOLOv1. However, while
knowledge, only one work on comics analysis have used deep YOLOv1 uses fully connected layers together with
learning before [22], the author use CNNs to analyze four- convolutional layers, YOLOv2 uses only convolutional layers.
scene comics and their transition.
42
Figure 2 shows the base model used in YOLOv2, which character appear. We made ground truth bounding boxes for
has 19 convolutional layers and 5 maxpooling layers. For the all characters, small or big, speaking or not and in the
detection task, YOLOv2 replaces the last convolutional layer background. Because the Sequencity comics are private, we
by three 3×3 convolutional layers with 1024 filters, each cannot provide the images, but we provide the ground truth
followed by a final 1×1 convolutional layer with the number and corresponding album names and page numbers for anyone
of outputs we need for detection. who wants to access the images7. Sequencity612 is split into
three sets: training set of 500 images, validating set of 50
images and testing set of 62 images. The three image lists for
the training set, the validating set and the testing set will also
be publicly available. To our knowledge, this is the largest
dataset for comic characters.
To create the ground-truth for Sequencity612, we have
identified five different types of characters that we classified
into: human-like, near human-like, far human-like, animal-like
and manga characters (from japanese mangas). Three people
have done the ground truth based on 5 targeted classes.
B. Fahad18 dataset
Fahad18 dataset is a collection of 586 images of 18
favorite cartoon characters obtained from Google [5]. There
are 18 cartoon characters in this dataset: Bart, Homer, Marge,
Lisa (The Simpsons), Fred and Barney (the Flintstones), Tom,
Jerry, Sylvester, Tweety, Bugs, Daffy, Scooby, Shaggy,
Roadrunner, Coyote, Donald Duck and Mickey Mouse. In the
whole dataset, the number of images for each character is
ranging from 28 (Marge) to 85 (Tom). Note that an image may
contain more than one character.
Fig. 2. Darknet-19 model used in YOLOv2
In order to detect comic characters, we train the YOLOv2

model on Sequencity612 train set and test on Sequencity612
test set (see section III.A). While convolutional neural
networks like Faster-RCNN or YOLOv2 are used to identify
homogeneous classes (e.g. human, animal or other objects),
comic characters are almost heterogeneous in term of shapes,
colors, and textures. For example, comic characters may be
object-like, human-like, animal-like, or others imaginary
characters as shown on Figure 1. In this work, we experiment
two types of character detection; the first one detects
characters as a one-class detector and the second one is a Fig. 3. Samples from Fahad18 dataset. Best viewed in color.
multiple-class detector.
C. Ho42 dataset
III. COMIC CHARACTER DETECTION
Ho42 [6] is a collection 42 comic pages which are used to
In this research, we have analyzed the performance and evaluate the redundant character detection in [6]. According to
features of the deep learning approach for comic characters the authors, a redundant character is a character that appears at
detection on our proposed dataset Sequencity612 and three least two times in a page.
available datasets from previous work: Fahad18 dataset [5],
Sun60 dataset [7] and Ho42 dataset [6]. D. Sun60 dataset
The dataset used in [7] for the task character detection is a
A. Sequencity612 dataset collection of 60 comic pages from 6 comic tiles. This
For our experiments, we propose a dataset called collection is taken from eBDtheque [8] which is composed by
Sequencity612. It is extracted from the online comic book 100 comic pages. Unfortunately we couldn’t get the selection
library Sequencity6. This library is composed by 140000 of 60 images from the authors so we use eBDtheque dataset
pages of comics from europe/america and mangas from Japan. instead.
We randomly selected 1000 comic pages and removed pages
containing covers, preface and pages without comic The eBDtheque (Figure 1) database is a selection of one
characters. There were 612 pages left where at least one
6 7
https://www.sequencity.com https://bitbucket.org/l3ivan/sequencity612
43
A. Sequencity612 dataset
The results of experiments on the Sequencity612 dataset
are presented in Figure. 5. We have compared the one-class
detector versus 5-classes detector for the task of character
detection. By observing the precision-recall curve, we can
confirm the effectiveness of the 5-classes model (dividing
character into multiple classes). Although the one-class model
has a little advantage when we care only about precision (the
recall is less necessary according to the application), the
precision is often better for the 5-classes model (when the
recall > 0.3). Moreover the 5-classes model can achieve higher
Fig. 4. Samples from Ho42 dataset. Best viewed in color. recall (0.88) while the recall of the one-class model reach
0.81. Even if the five classes that we have identified are not
hundred comic pages from America, Japan (manga) and indeed representing the diversity of comic characters, they are
Europe8. The dataset includes annotation of four different still more homogeneous than a single class. This characteristic
types of objects: text lines, balloons, panels and characters. is the reason why the 5-classes model performs better than the
The position property is defined by the coordinates of the one-class model (see Figure. 5).
bounding boxes that includes all the pixels of the object.
In this research, we experimented and compared our
results with available datasets from [5, 6, 7]. Because [5, 6, 7]
require extracted panels to detect characters inside panels. We
chose to use the same setting and extract panels by using the
algorithm in [28]. Panel positions will be provided with the
ground truth link9. While the objectives of [6, 7] on Sun60
and Ho42 datasets are not the same as ours, we have tried our
model on these datasets to prove its effectiveness as we can
detect more characters with our model than existing works
without additional knowledge. In contrast, we do not show the
comparison with methods in [6, 7] on Sequencity612 dataset
because they perform poorly due to their requirements that fit
only a limited subset of this dataset. For example, the Ho42
dataset needs images with repeated color characters on each
Fig. 5. Results on Sequencity612 dataset for one-class and 5-classes
comic's page, which is not the case of Sequencity612. models. The precision-recall curves show that one-class model is slightly
Similarly, Fahad’s work requires only rich colors images. better when recall is small (<0.2). However, 5-classes model gives an
overall better performance.
IV. RESULTS & DISCUSSION
To evaluate detection performance, we follow the B. Fahad18 dataset
PASCAL VOC evaluation criteria [29]. We report the In [5], the authors used color attributes as an explicit color
interpolated average precision (AP%) and the precision-recall representation for object detection. They proved that their
curve. Average precision computes the average value of method is most effective for comic characters in which color
precision p(r) over the recall intervals from r=0.0 to r=1.0 plays a pivotal role. In this section, we present the result of our
(see Formula 1). The PASCAL Visual Object Classes approach on the Fahad18 dataset. The Fahad18 dataset is
challenge (a well-known benchmark for object detection in divided into two sets. A training set of 304 comic images and
computer vision) computes average precision by averaging the a testing sets of 182 comic images. To evaluate detection
precision over a set of evenly spaced recall levels {0, 0.1, performance, the authors follow the PASCAL VOC evaluation
0.2, ... 1.0}. criteria [29]. Although, the Fahad18 dataset is composed of
cartoon images, which have different styles compared to
(1) comic images. We can prove the effectiveness of deep

where p_interp(r) is an interpolated precision that takes the learning approach compared to the method in [5] by using the
maximum precision over all recalls greater than r. same setting as this paper to train and test our model. Table I
shows results on Fahad18 dataset of our 18-classes model and
(2) results of [5]. Fahad18 dataset contains 586 images of 18
classes. The AP% for 18 classes are shown with the mean
Some visual results of our experiments (purple bounding
AP% over all classes in the last column. Note that the
boxes) are shown in the figures 1, 3 and 4.
proposed approach outperforms the method presented in [5]
with the detection and recognition of 14/18 classes and it gives
a significant improvement for the meanAP about 18.1%. The
proposed approach shows higher performance than [5].
8
http://ebdtheque.univ-lr.fr/database/ However, there are 4/18 classes where the method in [5] gives
9
https://bitbucket.org/l3ivan/sequencity612
better results than our approach: bart, marge, lisa, barney.
44
TABLE I. RESULTS ON FAHAD18 DATASET
bart homer marge lisa fred barney tom jerry sylvester tweety buggs daffy scooby shaggy roadrunner coyote donaldduck micky meanAP
Fahad 72.3 40.4 43.4 89.8 72.8 55.1 32.8 52.3 32.9 51.4 22.2 35.6 19.8 25.2 21.9 10.0 27.9 45.3 41.7
[5]
This 63.6 60.6 41.7 65.6 78.5 54.5 84.5 59.1 54.5 56.1 54.5 60.3 65.4 61.4 60.5 63.1 42.4 59.3 59.8
paper
TABLE II. NUMBER OF EXAMPLES IN TRAINING SET FOR 18 CLASSES

bart homer marge lisa fred barney tom jerry sylvester tweety buggs daffy scooby shaggy roadrunner coyote donaldduck mickymouse
Frequency in 19 17 14 16 30 15 43 42 28 21 62 23 31 26 21 26 25 20
training set
Note that these classes come from the same cartoon: The redundant character has been detected in 71.4% of the pages.
Simpsons. This lower AP% may originate from small number Partial characters have been detected in 9.6% of the Ho42
of training samples. The Table II shows the numbers of dataset. For the rest, characters have been detected but only
examples in the training set for 18 classes. These 4 classes are one time.
in the last five in terms of training instances (the five classes
are: bart, homer, marge, lisa and barney). This is a potential
indicator that the deep learning approach may be more
sensitive to the number of class instances in the training set
than other approaches.
We have also tested directly our model learnt from
Sequencity612 dataset on Fahad18 dataset. This model can
only find characters without knowing the specific class among
the 18 classes of Fahad18. The mean average precision (mAP)
of this experiment is 43.77%. This result is quite below the
previous result and it is reasonable because of the difference
between cartoon images (Fahad18) and comics images
(Sequencity612). We can see that the mAP is higher than [5],
however, [5] can detect the class of the characters. Fig. 6. Results on Sun60 dataset. Precision-recall curves in %.
C. Sun60 and Ho42 datasets
For both datasets, authors don’t use any training. They
applied the algorithms directly to detect characters. Therefore,
we will use these datasets to prove the effectiveness of our
approach by applying the model trained on the Sequencity612
dataset to these two datasets without re-training it. Table III
shows the results of character detection taken in the paper [7].
While we know that the Sun60 dataset of 60 comic pages used
in [7] is a subset of eBDtheque we did not know exactly which
60 comics pages of eBDtheque are used. So we tested multiple
sets of 60 comics pages which are randomly taken from
eBDtheque and compute the average results. In [7], the authors
have not reported the precision recall curve, but final values of
precision and recall: 35.48% for the recall and 79.73% for the Fig. 7. Precision-recall curve of our method on Ho42 dataset.
precision. Figure 6 depicts the results of our method. When the
recall is 35.48%, the precision is about 85% (compare to TABLE III. RESULTS OF CHARACTER DETECTION IN [7]
79,43% of [7]). And when the precision is 79.73%, the recall Title 1 2 3 4 5 6 Mean
is about 51% (compare to 35.48% of [7]). This result is Recall (%) 23 36 50 33.3 23.6 47 35.48
obviously better than [7]. Precision (%) 72 81.8 68 100 57 97.8 79.43
In [6], the inexact graph matching is used to automatically Fig. 7 depicts the result of our method on the Ho42 dataset.
localize the most frequent color group apparitions and label The precision-recall curve shows a very high performance.
them as main characters. Their experiment was carried out While [6] can detect at least 2 characters on 81% of pages, our
with the Ho42 dataset where all pages contain at least one method can detect almost 92% of characters on the whole
redundant character and each page consists of 4 panels. They dataset. Note that the algorithm in [6] can detect redundant
evaluated the method by verifying if the algorithm is able to characters without training a model while our approach can
detect redundancies in each comic page and not in the whole detect character in the Ho42 dataset with a model trained on
album. To evaluate algorithm performance, the detection is another non-related dataset (Sequencity612).
considered as valid if the redundancy condition is true. It
means that at least one redundant character should be detected D. Conclusion
in a page. Table III presents the results of [6]. At least one To make digital comic books be used on a large scale in
45
future devices, we need first to solve the problem of comic [13] B. Duc. L’art de la B.D.: “Du scénario à la réalisation
image understanding. However, scene analysis, component graphique, tout sur la création des bandes dessinées.” Editions
extraction and story understanding progress are still Glénat, 1997. 1, 5, 76
insufficient to be industrialized. Our goal was not to develop a [14] J.-M. Lainé and S. Delzant. Le lettrage des bulles.
new character detection method but to introduce a state-of-the- Eyrolles, 2010. 5, 76
art baseline method thanks to the recent development of deep [15] M. Iwata, A. Ito, K. Kise, “A study to achieve manga
learning and to propose a new large dataset with ground-truth character retrieval method for manga images.” in DAS, 2014
of comic characters. Experiments on four different datasets [16] T.-N. Le, M. M. Luqman, J.-C. Burie, and J.-M. Ogier,
reaffirm the benefit of deep learning approach on comic “A comic retrieval system based on multilayer graph
character detection. Deep learning approach needs offline representation and graph mining.” in GbRPR. Springer, 2015.
training because of important computational power
[17] Y. Matsui, K. Ito, Y. Aramaki, T. Yamasaki, and K.
requirements and more data to get most of its performance, but
Aizawa, “Sketch-based manga retrieval using Manga109
we have proved in this paper that it already gives the best
results in comparison with existing methods. dataset.” CoRR, vol. abs/15 10.04389, 2015
[18] A. Krizhevsky, I. Sutskever, G. E. Hinton, “Imagenet
Acknowledgment classification with deep convolutional neural networks.” in
This work is supported by the University of La Rochelle NIPS, 2012, pp. 1106–1114.
(France), the town of La Rochelle and the PIA-iiBD [19] Y. Bengio. “Learning deep architectures for ai.”
Foundations and Trends in Machine Learning, 2(1):1–127, 2009.
(“Programme d’Investissements d’Avenir”). We are grateful
[20] Y. Bengio, A. Courville, and P. Vincent. “Representation
to all authors and publishers of comics images from Fahad18,
learning: A review and new perspectives.” IEEE PAMI,
eBDtheque and Sequencity dataset for allowing us to use their
35(8):1798–1828, 2013.
works.
[21] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H.
REFERENCES Lipson. “Understanding neural networks tlhrough deep
visualization.” In ICML Workshop on Deep Learning, 2015
[1] W. Sun and K. Kise, “Detection of exact and similar
[22] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and
partial copies for copyright protection of manga.” IJDAR, vol.
D. Ramanan. “Object detection with discriminatively trained
16, no. 4, p.331–349, 2016.
part based models.” IEEE PAMI, 32(9):1627–1645, 2010
[2] W. Sun and K. Kise, “Similar Partial Copy Detection of
[23] R. Girshick, J. Donahue, T. Darrell, and J. Malik. “Rich
Line Drawings Using a Cascade Classifier and Feature
feature hierarchies for accurate object detection and semantic
Matching," in ICWF, vol. 6540. Springer, 2010, pp. 126–137.
segmentation.” In CVPR, pp. 580–587, 2014
[3] P.A.Viola and M.J.Jones, “Robust real-time face
[24] J. Redmon, A. Farhadi. “YOLO9000: Better, Faster,
detection.”In IJCV, vol. 57, no. 2, pp. 137–154, 2004.
Stronger “, in Proc. of the Conf. of Computer Vision and
[4] T. Kohei, J. Henry, and N. Tomoyuki. “Face detection
Pattern Recognition (CVPR), 2017.
and face recognition of cartoon characters using feature
[25] M. Ueno, N. Mori, T. Suenaga, and H. Isahara,
extraction.” In IEVC’12, Kuching, Malaysia, 2012. 18.
“Estimation of structure of four-scene comics by
[5] F. S. Khan, R. M. Anwer, J. van de Weijer, A. D.
convolutional neural networks.” in MANPU@ICPR, 2016.
Bagdanov, M. Vanrell,and A. M. Lopez, “Color attributes for
[26] J.R.R. Uijlings, K.E.A. van de Sande, & all “Selective
object detection.” in CVPR. 2012, pp. 3306–3313.
search for object recognition.”, In IJCV, vol. 104, no. 2, 2013.
[6] H.N Ho, C. Rigaud, J.-C. Burie, J.-M. Ogier.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid
“Redundant structure detection in attributed adjacency graphs
pooling in deep convolutional networks for visual
for character detection in comics books.” 10th IAPR Int.
recognition.” CoRR, vol. abs/1406.4729, 2014.
Workshop on Graphics Recognition, Aug 2013, USA.
[28] C. Rigaud, N. Tsopze, J.-C. Burie, J.-M. Ogier, “Robust
[7] W. Sun, J.-C. Burie, J.-M. Ogier, and K. Kise, “Specific
frame and text extraction from comic books.” in GREC, 2011
comic character detection using local feature matching.” in
[29] M. Everingham, L. J. V. Gool, C. K. I. Williams, J. M.
ICDAR. 2013, pp. 275–279
Winn, and A. Zisserman, “The pascal visual object classes
[8] C. Guérin, C. Rigaud, & al: eBDtheque: a representative
(voc) challenge.” In IJCV, vol. 88, no. 2, pp. 303–338, 2010.
database of comics. ICDAR, 2013, p. 1145-1149.
[30] S. Ren,K. He, R. B. Girshick, J. Sun, “Faster r-cnn:
[9] C. Rigaud, “Segmentation and indexation of complex
Towards real-time object detection with region proposal
objects in comic book images.” Ph.D. dissertation, Univ. of La
networks”, NIPS, 2015, pp. 91–99.
Rochelle, France, 2014.
[31] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, S. Liu, “Towards
[10] S. Medley. “Discerning pictures: How we look at and
better analysis of deep convolutional neural networks.” CoRR,
understand images in comics.” Studies in Comics, 2010
vol.abs/1604.07043, 2016.
[11] N. Cohn. “The limits of time and transitions: Challenges
[32] W.-T. .Chu and W.-C. Cheng, “Manga-specific features
to theories of sequential image comprehension.” Studies in
and latent style model for manga style analysis.” in ICASSP.
Comics, 1(1):127–147, 2010.
IEEE, 2016, pp. 1332–1336.
[12] H.A.Ahmad, S. Koyama, H. Hibino. “Impacts of manga
[33] R. Girshick. “Fast R-CNN.” In Proceedings of the 2015
on indonesian readers self-efficacy and behavior intentions to
IEEE ICCV, Washington DC, USA, 1440-1448. 2015.
imitate its visuals.” Bulletin of JSSD, 59(3), 2012. 18
46

Comic Characters Detection Using Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comic Characters Detection Using Deep Learning

Uploaded by

Copyright:

Available Formats

2017 14th IAPR International Conference on Document Analysis and Recognition

Comic characters detection using deep learning

Abstract—Comic characters detection has been an interesting coherent [9].

2379-2140/17 $31.00 © 2017 IEEE 41

Fig. 2. Darknet-19 model used in YOLOv2

In order to detect comic characters, we train the YOLOv2

TABLE II. NUMBER OF EXAMPLES IN TRAINING SET FOR 18 CLASSES

You might also like