Professional Documents
Culture Documents
Tanaka 2000
Tanaka 2000
Tanaka 2000
www.elsevier.com/locate/knosys
Abstract
In this paper, we propose a tool for extracting compositional information from pictures called the Composition Analyzer. This tool extracts
such compositional information as the sizes, shapes, proportions, and locations of ®gures, by two processes. More speci®cally, it ®rst
segments a picture into ®gures and a ground by a ®gure extraction method we developed. It then extracts the above compositional
information from the ®gures based on the Dynamic Symmetry principle. The extracted compositional information is used to re®ne the
picture, and as such, facilitates the production of multimedia for non-professionals. q 2000 Elsevier Science B.V. All rights reserved.
Keywords: Composition; Paintings; Attractive region
1. Introduction color usage [1]. This system utilizes theories and guidelines
on human color perception, cultural associations of color,
Most problems faced by non-professional multimedia and appropriate color combinations, which have been
authors in creating good titles are not technological in studied in visual communications design, to construct a
nature, but rather due to a lack of expertise or knowledge rule base. Our system is an example-based system rather
about multimedia designs. Most commercial authoring tools than a rule-based system.
have few functions to support such users in achieving their In this paper, we present one of the tools the above envir-
goals [1]. Consequently, non-professional authors have onment has, i.e. the ªComposition Analyzerº, which
been suffering from not only problems in understanding extracts compositional information from pictures, such as
tool functions, but also in deciding design details. the shapes, proportions and locations of ®gures.
We believe that the multimedia elements of professional This paper is organized as follows: Section 2 presents an
products, such as color combinations, textures, composi- overview of the Composition Analyzer, Section 3 describes
tions, and lighting effects, encompass a lot of the profes- the composition analysis processes of this tool, and Section 4
sional techniques or expert knowledge developed introduces a system that utilizes compositional information
throughout the history of art. Consequently, providing extracted by the Composition Analyzer to re®ne a picture.
these elements with appropriate tools to non-professional
authors can navigate these non-professionals towards creat-
ing better products [2]. On this assumption, we have been 2. Composition analyzer
developing a creative learning environment to help authors
with the production of better images (see Fig. 1) [2]. Composition involves many aspects; however, it can
In previous research, Nakakoji, et al. developed a knowl- roughly be said that composition is a plan for arranging
edge-based color critiquing support system, eMMaC, which objects in a picture with a good balance [3±6]. Any picture
critiques the use of color in a title and suggests appropriate emotionally affects its viewers differently depending on
how the objects in the picture are composed. The composi-
* Corresponding author. Tel.: 181-774-95-1465; fax: 181-774-95-1408. tion can create not only emotional effects but also rhythm or
1
E-mail address: gon@mic.atr.co.jp (S. Tanaka). dynamics in the picture. It is therefore important to deter-
Tel.: 181-6-6850; fax: 181-6-6850-6371. mine where objects should be located, and also what sizes or
q
Derived from `Composition Analyzer: Computer Supported Composi-
tion Analysis on Masterpieces', published in the Proceedings of the Third
shapes these objects should have [6].
Conference on Creativity and Cognition, Loughborough, UK, October 10± Throughout the history of art, the golden section has been
13, 1999, pp. 68±75. Reproduced with permission from ACM q 1999. used to make the most beautiful and ideal proportions of
0950-7051/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.
PII: S 0950-705 1(00)00062-9
460 S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
In fact, it has been found that the V4 cortex in the human fi,1: the local color contrast of region i,
visual system plays an important role in ®gure-ground RgnColDifi,j: the color difference between region i and
segmentation [8]. The V4 cortex is sensitive to many neighboring region j, which touches region i,
kinds of information both in the spatial domain and in the wi: the penalty coef®cient of region i,
spectral domain relevant to object recognition [8]. In the ei: the euler number of the mask image of region i,
spatial domain, many V4 cells exhibit some length, width, tli: the length of the border line of region i and neigh-
orientation, direction of motion, and spatial frequency selec- boring regions,
tivity. In the spectral domain, they are tuned to the wave- li: the length of the contour of region i,
length [8]. In particular, it has been found that most V4 cells Lti, ati, bti: the color value of region i in the L p a p b p
respond best to a receptive ®eld stimulus if there is a spectral color space,
difference between the receptive ®eld stimulus and its Lsi, asi, bsi: the color value of neighboring region j in
surroundings [8]. The above ®ndings conclude that one of the L p a p b p color space,
the contributions of the V4 cortex to visual processes is ni: the number of neighboring regions of region i.
®gure-ground segmentation. 2. Local texture contrast:
At the V4 cortex, no semantic information is processed.
TexDif i 2 min TexDif k
Consequently, no attractiveness evaluation is performed fi;2 k
5
based on the meanings of scenes or the viewer's interests max TexDif j 2 min TexDif k
j k
at this stage. Accordingly, it is possible for a picture to be
segmented into ®gures and a ground only based on physical
features, such as the spectral domain (color) and the spatial 1 X ni
TexDif i wi RgnTexDif i;j 6
domain (texture) processed by the V4 cortex. ni j1
From the above considerations, we use the color
contrast and texture contrast of regions for ®gure-ground v
u nf
segmentation. uX
RgnTexDif i;j t Tti;k 2 Tsj;k 2 7
k1
3.1.1. Contrast parameter de®nition
Two types of contrasts can be considered for picture where
regions. One is a local contrast, i.e. the difference between fi,2: the local texture contrast of region i,
a region and its surroundings. The other is a global contrast, RgnTexDifi,j: the euclidean distance between the
i.e. the difference between a region and the whole picture. texture feature vector region i and neighboring region
Here, we use both the local contrast and the global contrast. j,
In addition to the above types of contrasts, focus is Tti,k: the texture feature vector of region i,
another important factor for an enhancement of the contrast Tsj,k: the texture feature vector of neighboring region j,
[6]. A focused region is more attractive than a blurred nf: the number of elements in the texture feature
region. Furthermore, the contour of a focused region is vector.
sharp; the contour of a blurred region is not. 3. Global color contrast:
From the above considerations, the following parameters GRgnColDif i 2 min GRgnColDif k
are used for ®gure-ground segmentation. k
fi;3 8
max GRgnColDif j 2 min GRgnColDif k
j k
1. Local color contrast:
ColorDif i 2 min ColorDif k GRgnColDif i
k
fi;1 1
max ColorDif j 2 min ColorDif k q
j k
wi Lti 2 Lav 2 1 ati 2 aav 2 1 bti 2 bev 2 9
1 X ni
where
ColorDif i wi RgnColDif i;j 2
ni j1 fi,3: the global color contrast of region i,
Lav, aav, bav: the average color value of the picture.
4. Global texture contrast:
RgnColDif i;j
q GRgnTexDif i 2 min GRgnTexDif k
k
Lti 2 Lsj 2 1 ati 2 asj 2 1 bti 2 bsj 2 3 fi;4 10
max GRgnTexDif j 2 min GRgnTexDif k
j k
1 tli v
wi 4 u nf
uei 2 2u li uX
GRgnTexDif i wi t Tti;k 2 Tavk 2 11
where k1
462 S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
characteristics of ®gures.
where
fi,4: the global texture contrast of region i, 1. A closed or surrounded region is apt to be regarded as a
Tav: the average texture feature vector of the picture. ®gure [6].
5. Sharpness of contour: 2. A ®gure is seen as having a contour; not the ground [6].
Focusi 2 min Focusk Concerning item (1) above, a euler number is calculated
k
fi;5 12
max Focusj 2 min Focusk for each mask image of the region. This number is calcu-
j k
lated by subtracting the number of objects in the picture
with the number of holes in the object [15]. For instance,
1 X ni
if a picture has one object and the object has two holes then
Focusi wi u7Rc i;j x; yu 13
ni j1 the euler number of the picture will be 21. Based on this
characteristic, the penalty coef®cient for item (1) (the left
q side of Eq. (4)) is calculated by the following processes,
u7Rc i;j x; yu Rcx2i;j x; y 1 Rcy2i;j x; y 14 such that the greater the number of holes in the region, the
smaller the contrast value will be.
where For item (2) above, there are regions that touch the
fi,5: the sharpness of the contour of region i, edge(s) of a picture. These regions have a lot of possibility
Focusi: the average edge magnitude of the contour of of being the ground of the picture. To represent whether or
region i, not a region is completely surrounded by other regions, the
u7RC i;j x; yu : the edge magnitude of pixel j on the length of a tangent line to surrounding regions is measured
contour of region i, (excluding the inside of the region, see Fig. 5). Then, the
Rcx: the gradient in the x direction, length is divided by the length of the contour of the region.
Rcy: the gradient in the y direction. By multiplying this value with every parameter (the right
side of Eq. 4, see Fig. 5), the contrast value of the region is
For the color difference calculation, we use the CIE L p forced to be small if the region touches the edge(s) of the
a p b p color space. This is because color differences in this picture.
color space correspond to the human visual sense in general
[7]. For texture features, a multi-resolution representation 3.1.2. Discrimination function
based on Gabor ®lters is used. Gabor features have been In order to analyze how people achieve ®gure-ground
used to characterize the underlying texture information in segmentation, we collected data on ®gure regions and
given regions. [10,11]. Because Gabor ®lters are not ortho- ground regions of 100 pictures, by performing subjective
gonal to each other in nature, however, redundant informa- experiments with 15 people. In the experiments, we showed
tion exists in the ®ltered images. In order to reduce such complete original pictures individually on a CRT
redundant information, the ®lter parameters must be chosen (2048 £ 2048 resolution) and segmented regions on another
by using the algorithm presented in [9]. This algorithm CRT, and asked the subjects whether each segmented region
ensures that the half peak magnitudes of the ®lter responses was a part of the ®gure or not.
in frequency spectra touch each other as shown in Fig. 4. The Edge Flow model was used for the segmentation
To present texture features, we use 24 ®lters consisting of [12]. This model utilizes a predictive coding model to iden-
four scaled and six oriented ®lters. tify the direction of change in the color and texture at each
A penalty coef®cient is employed based on the following image location on a given scale, and constructs and edge
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470 463
Draw the center lines Make the whole picture the target
IF the type is Figure End IF
Subdivide the picture into two golden rectangles WHILE it is not stable
Make the two rectangles the targets FOR the target rectangles in the picture
ELSE Divide the rectangles into smaller rectangles
466 S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
Fig. 10). Since the tool allows the user to generate different
pictures depending on speci®ed compositions, the user can
experiment with a variety of good compositions on an origi-
nal picture. As a result, the user can learn how masters have
maintained the visual balance of pictures.
For the image recomposition, the system asks the user to
input three pictures, an original picture, a ground picture,
and a guide picture. The ground picture is a picture onto
which the system recomposes ®gures of the original picture.
Both the original picture and ground picture are input into
the system by using an image scanner or by specifying them
as ®les. The guide picture is a picture with compositional
information, and provides recomposition guidelines for the
system.
An Image Database is available for guide picture retrie-
val. When the user searches for pictures, the user can specify
an author's name, a picture's name, a type of picture
(portrait, scenery, group, etc.), or how many or what kinds
of objects there are.
Fig. 10. Image re-composer. After the user inputs the original picture, the system, tries
to extract ®gure objects from the picture by the ®gure
rates is 56.5%, and the success rates differ depending on the extraction method. Then, the user chooses objects which
era. To understand these results, we investigated the history are to be recomposed by the system. Image Re-Composer
of paintings. ®nally recomposes the selected objects according to the
In fact, the golden section was not a popular technique to compositional information of the guideline picture.
determine the composition of paintings at the beginning of the The following section explains the above processes in
Renaissance, and it was only used by painters in Northern detail.
Europe [13]. Then, the golden section gradually spread out
to all of Europe, and in the mid-Renaissance period, it became 4.1. Object selection
a common technique among painters and architects, especially
those who designed churches or painted pictures of altars. In After the user inputs a picture having the desired objects
Baluch, the Dutch masters in particular used this traditional to recompose, Image Re-Composer tries to extract those
technique to compose their paintings. For Romance or objects by the ®gure extraction method. However, because
Realism, artists were not at all concerned about composition; the extraction method will not always give perfect results,
they simply wanted to compose their works without consider- the user is asked to correct the results by the system, when
ing composition. With Impressionism, artists came to realize necessary. The system shows the extraction results with
the importance of composition again, and the golden section ®gure regions in color and ground regions in gray. If the
was revived. This movement is still going on. user is not satis®ed with the results, he/she can correct any
Considering the above, it stands to reason why we got the one of them by selecting or de-selecting regions that have
results shown in Table 1. What is more interesting, however, been mis-discriminated by the system. The system then asks
is that this engineering approach can give one quantitative the user to discriminate each object that the user wants to
evidence to support theories in Art. recompose. The user can discriminate each object by select-
From the results in Table 1, the average of the success ing multiple regions consisting of the object.
rates is 56.5%; however, the method can be useful enough to When the user extracts an object from the input picture,
collect a suf®cient amount of compositional information the object appears in an object browser of the Image Re-
considering the existing number of paintings. Composer control panel, as shown in Fig. 11. This tool
registers all of the objects that the user previously extracted,
and it allows the user to specify objects extracted from
4. Image re-composer different pictures in order to recompose them within a
picture. Similar browsers are also available for the ground
Here, we introduce an application system that uses the picture and the guide picture.
compositional information extracted by the Composition If the user selects two objects, the system gives the
Analyzer. The system is called ªImage Re-Composerº, objects IDs in order to take the correspondence between
and is a post-production tool that decomposes a picture objects in the guide pictures and the user selected objects
and regenerates it as a new improved picture according to (see the left side of Fig. 11). Currently, the system allows the
compositional information extracted from masterpieces (see user to select two objects or less at the same time; because
468 S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
we have experimentally found that it is better to recompose pixels of the ground picture, and s the scale coef®cient for
three objects or more as a group of objects rather than re- the selected object.
compose them individually.
s m p n=x 20
4.2. Guide picture search Then, the system scales the object with the calculated
coef®cient.
When the user retrieves a guide picture from the image For the location adjustment, we calculate a new location
database, the user can specify what kinds of objects and how with the following equations.
many objects he/she wants to recompose as keywords. The Let wg be the width of a guide picture, hg the height of the
system retrieves pictures that match the speci®ed keywords, guide picture, (xg, yg) the center of the gravity of an object in
and then matches the shapes between the user selected objects the guide picture, wb the width of the ground picture, hb the
and the retrieved objects. Finally, the system shows the user height of the ground picture, and (xf, yf) the center of gravity
guide pictures whose objects are as similar to the user-selected of the user-speci®ed object.
objects as possible. This function is provided to assist the user
as much as possible in not specifying a bad combination. For xf wb p xg =wg 21
example, when the user tries to recompose a standing ®gure, it
is not good for the system to recommend the composition of a yf hb p yg =hg 22
sitting ®gure, because the result would obviously be bad.
For the shape matching, we employ P type Fourier
4.4. Image re-composed results
descriptors [17] to represent the shape, and measure the
similarity as the euclidian distance between vectors whose Fig. 12 shows some examples of recomposed results by
items are the above Fourier descriptors. the system.
As shown in Fig. 12, the user can experiment with a
4.3. Image re-composition variety of compositions on the same objects. Therefore,
the system is useful for non-professionals to learn how to
For the image recomposition, the system adjusts the sizes
maintain the visual balance within a picture.
and locations of the selected objects according to the speci-
®ed guide picture, and composes the objects onto the speci-
®ed ground picture. 5. Conclusion
For the size adjustment, the system calculates a scale
coef®cient for each object as follows: In this paper, we described a tool for extracting the
Let x be the number of pixels of a selected object, m the compositional information of pictures. The compositional
ratio between the size of an object within the guide picture information is extracted from a picture by two processes:
and the size of the whole guide picture, n the number of ®gure extraction and composition analysis on the extracted
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470 469
®gures by using the Dynamic symmetry principle. The result eMMaC: Knowledge-Based Color Critiquing Support for Novice
of the analysis is used in an application system, Image Re- Multimedia Authors, Proc. ACM Multimedia '95, 1995, pp. 467±
476.
Composer, which is a tool for re®ning a picture according to [2] A. Plante, S. Tanaka, S. Inoue, M-Motion: a creative and learning
compositional information extracted from masterpieces. environment facilitating the communication of emotions, Proc. CGIM
This tool can also help non-professionals learn how to '98, 1998, pp. 77±80.
maintain visual balance since they are able to explore a [3] D.A. Dondis, A Primer of Visual Literacy, The MIT Press,
variety of compositions of professional works. Furthermore, Cambridge, MA, 1974.
[4] Shikaku Design Kenkyusho Corporation, Essence of Composition,
the tool can be used by researchers working in art societies
1995 (in Japanese).
to analyze paintings. [5] Yanagi Ryo, Golden Section, Bijyuthu Shuppan Sha, 1998, (in Japa-
However, the composition information extracted by the nese).
Composition Analyzer is static information. A dynamic [6] R.D. Zakia, Perception and Imaging, Focal Press, 1997.
composition does exist that represents the context of a [7] Tadasu Oyama, Shoga Imai, Tenji Wake, Handbook of Sensation and
Perception, Seishin Shobo, 1996 (in Japanese).
picture [4]. This composition has been known as the ªLead-
[8] R. Desimone, S.J. Schein, J. Moran, L.G. Ungerleider, Contour, color
ing eyeº. In this case, the artist usually leads the attention of and shape analysis beyond the striate cortex, Vision Research 25
viewers from the main object in his/her work to various (1985) 441±452.
points in the picture, or leads the attention of viewers [9] B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrie-
from a sub-object to the main object by controlling the val of image data, IEEE Transactions on Pattern Analysis and
level of attraction of sub-objects [4]. In order to extract Machine Intelligence 18 (8) (1996) 837±842.
[10] D. Dunn, W.E. Higgins, Optimal Gabor ®lters for texture seg-
this information, it is necessary to evaluate the level of mentation, IEEE Transactions on Image Processing 4 (7) (1995)
attractiveness of regions properly. Future work will there- 947±964.
fore involve such attractiveness evaluation. [11] K.Jain Anil, F. Farrokhnia, Unsupervised texture segmentation using
Gabor ®lters, Pattern Recognition 24 (12) (1991) 1167±1186.
[12] W.Y. Ma, B.S. Manjunath, Edge ¯ow: a framework of boundary
detection and image segmentation, Proc. CVPR '97, 1997, pp. 744±
References 749.
[13] T. Kanbayashi, K. Shioe, K. Shimamoto, Handbuch der Kunstwis-
[1] K. Nakakoji, B.N. Reeves, A. Aoki, H. Suzuki, K. Mizushima, senschaft, Keisou Shobou, 1997 (in Japanese).
470 S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
[14] C. Wallshlaeger, C. Busic-Snyder, Basic Visual Concepts and Prin- [16] Planet Art, A Gallery of Masters, 1997.
ciples, McGraw Hill, New York, 1992. [17] Y. Uesaka, A new Fourier descriptor applicable to open curves, IEICE
[15] S. Tanaka, Y. Iwadate, S. Inokuchi, A ®gure extraction method based Transaction J67-A (3) (1983) 166±173.
on the color and texture contrasts of regions, Proc. ICIAP '99, 1999,
pp. 12±17.