Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Bachelor Thesis

Analysis and Evaluation of


Visuospatial Complexity Models

Bashar Hammami and Mjed Afram


Computer Science

Studies from the School of Science and Technology at Örebro University


Örebro 2022
Studies from the School of Science and Technology at Örebro University

Bashar Hammami and Mjed Afram

Analysis and Evaluation of


Visuospatial Complexity Models

Supervisor: Mehul Bhatt, Örebro University

Examiner: Franziska Klügl

© Bashar Hammami and Mjed Afram, 2022


Abstract

Visuospatial complexity refers to the level of detail or intricacy present within a scene, tak-
ing into account both spatial and visual properties of the dynamic scene or the place (e.g.
moving images, everyday driving, video games and other immersive media). There have
been several studies on measuring visual complexity from various viewpoints, e.g. mar-
keting, psychology, computer vision and cognitive science. This research project aims at
analysing and evaluating different models and tools that have been developed to measure
low-level features of visuospatial complexity such as Structural Similarity Index measure-
ment, Feature Congestion measurement of clutter and Subband Entropy measurement of
clutter. We use two datasets, one focusing on (reflectional) symmetry in static images,
and another that consists of real-world driving videos. The results of the evaluation show
different correlations between the implemented models such that the nature of the scene
plays a significant role.

Keywords
Complexity, Visual Complexity, Visuospatial Complexity, Strauctural Similarity Index Mea-
surement , SSIM, Visual Clutter, Feature Congestion, Subband Entropy

1
2

Acknowledgements
As a team, we would like to thank our supervisor Mehul Bhatt for always being there
to assist and help us, of course along with Vasiliki Kondyli for her valuable advice and
observations. Thank you for your support. We would like to express our appreciation for
our family who has been with us and supported us throughout our journey.
Contents

1 Introduction 5
1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Division of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Related Works 7
2.1 Measuring Visual Clutter: Congestion and Entropy . . . . . . . . . . . . . . . 7
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Feature Congestion measurement of visual clutter . . . . . . . . . . . 7
2.1.3 Subband Entropy measure of visual clutter . . . . . . . . . . . . . . . . 8
2.2 Structural Similarity Index Measurement . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 AUVANA: An Automated Video Analysis Tool for Visual Complexity 11
2.2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 The development of AUVANA . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Reflectional Visuospatial Symmetry . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Human Evaluation: A Qualitative Study . . . . . . . . . . . . . . . . . 15
2.4 Human-Centred Visuospatial Complexity . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Multimodal Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3 Human Centred Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Analysis and evaluation of Visuospatial Complexity 19


3.1 Reflectional Symmetry and Clutter . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.1 Experiment 1: Structural Similarity and Clutter . . . . . . . . . . . . . 22
3.1.2 Experiment 2: Structural Similarity: Mirroring vs. Non-Mirroring . . . 25
3.1.3 Experiment 3: Symmetry and Structural Symmetry (with mirroring) 28

3
CONTENTS 4

3.2 Experiment 4: Symmetry and Clutter in Naturalistic Driving . . . . . . . . . . 30

4 Conclusion and Outlook 35


4.1 Review of Project Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

References 38
Chapter 1
Introduction

This project aims to analyze and compare different models of visual/visuospatial com-
plexity using different models and tools based on different perspectives. Then, in evaluat-
ing which method is better or more appropriate, different factors are taken into account
Several researches have been conducted on visual complexity and on how to analyzes
and measure it. The topic has been studied from different viewpoints (such as computer
science, psychology, cognitive science) to provide solutions in different disciplines [5]. Re-
searchers have developed a new models for measuring complexity, and there were already
other models out there. These models have used different methods and can give different
results even on the same picture of complexity.
These models have been helpful in investigating why accidents happen and how people
react in different ways in different situations depending on different factors both visible
but also spatial conditions such as place in the world you are in or some other factor.

1.1 Problem Formulation


This project is being developed through the research, analysis, and comparison of dif-
ferent models of visual complexity measurement developed by other researchers, and
other models that already exist by analyzing the models, test them through two differ-
ent datasets, and find similarities and differences between the models and draw some
conclusions on their efficiency to detect complexity in sample dynamic and static real-
world scenes. Through this work we are interest to know how different models respond
to different levels of complexity and different situations depending on what the model’s
characteristics are and what it is that makes it behave as it does.

1.2 Division of Work


This thesis is the result of a collaborative effort between two students. The aim is to exam-
ine a different measurement model for each of us. Moreover, we will be provided with two
different datasets that will be used in order to compare different models for measuring
visuospatial complexity. Ultimately, each comparison will lead to a conclusion.

5
1.3. MOTIVATION 6

1.3 Motivation
Our project involves the task to conduct a survey and analysis of visual complexity as
pursued in different researches. The theme of this project has been defined based on our
interested in topics such as how humans interact with machines, and how we design
systems or machines with human-centric requirements.

1.4 Outline
Here is an overview of how the rest of the thesis is organized:

Chapter 2 Describes previous work in the area of visual and visuospatial complexity.

Chapter 3 Analysis and evaluation of visuospatial complexity, Showing the results from
different datasets and discussion about the achieved results.

Chapter 4 Contains conclusion of the project goal and some future works.
Chapter 2
Related Works

2.1 Measuring Visual Clutter: Congestion and Entropy

2.1.1 Introduction

Calculating visual clutter for a scene or a display is one of several approaches for measur-
ing the visual complexity. There have been various studies conducting on visual search to
find solutions for different purposes e.g. making decisions based on gathering visual in-
formation; searching for a particular object such as a danger in baggage X-ray, a document
on a desktop or moving objects while driving [10].
What is the notion of clutter? When attempting to organize objects existing on our desks,
clutter is the state in which redundant items, or their organization result in a degraded
performance at some tasks [10]. In other words, visual clutter refers to the comfort addi-
tion of an attention-grabbing object to that display. Additionally, Rosenholtz, Li, & Nakano
argued that adding more items to a display does not necessarily indicate a negative result,
on the contrast, it may also enhance performance under specific circumstances. That oc-
curs when items have low entropy which means that an item can be easily distinguished.
For measuring the visual clutter, Rosenholtz et al. have developed a model called Statical
Saliency Model for capturing human performance at functional level using the concep-
tion that "the visual system is designed to characterize various statistical aspects of the
visual display" [10]. Rosenholtz model makes it easier to comprehend the reasons why an
object in a scene is or is not salient; it can imply features for a silent item which can aid
for creating a silent target. One approach to measure visual complexity is to determine
the features of items in a display and measure local variability of the features, such as
luminance, orientation and contrast. Therefore, Rosenholtz, Li, & Nakano developed two
models for measuring the visual clutter, relying upon estimating the variance in specific
key features, in which they provide a part of measuring visual complexity in a scene.

2.1.2 Feature Congestion measurement of visual clutter

The first model is Feature Congestion measure of visual clutter (Clutter FC). Rosenholtz,
Li, & Nakano contend that "For a given number of objects in a scene, the scene will appear
less cluttered the more ’organized’ it is. Organization may involve grouping similar objects
together; aligning them; and making many of the objects a similar hue, luminance, size,
and so forth.... The degree of organization of a scene can be thought of in terms of the
extent to which each part of the scene is predictable from the rest of the scene, or in terms
of the amount of redundancy in the scene". Feature Congestion of Clutter incorporates this

7
2.1. MEASURING VISUAL CLUTTER: CONGESTION AND ENTROPY 8

concept of organization to a certain extent implicitly through considering the covariance


of features, as well as capturing some measure of grouping by similarity and proximity in
a scene [10].
Feature Congestion measure measurement is based on three key features: color variability,
luminance contrast and orientation contrast. These three features have been utilized to
form various perceptual phenomena e.g. pattern discriminability [16] and preattentive
texture segmentation [6]. Color variation is a significant key feature when dealing with
visual clutter. It affects the visual search by providing information about the nature of a
display [10]. Luminance contrast can be detected by using Contrast feature detector which
provide alongside a measure for shape and size[10]. Orientation contrast is measured by
computing oriented opponent energy in which it indicates the orientation variability in a
certain scale and position[10].

Figure 1: (A) Low orientation variance and high luminance variance. (C) Low orientation
and high luminance. (D) High luminance and high orientation.

Figure 1 illustrates the different ways in which a group of objects may have different
luminance, orientation, and color characteristics. figure 1A shows a number of lines with
high luminance variability, some items are darker and some are lighter, as well as low
orientation variability, where it is obvious that items are positioned to same direction.
Therefore, by changing the orientation of one item, it would be easy to (determine that
item as a salient item) attract attention to that item, as shown in figure 1B. Figure 1C show
high orientation variability and high luminance variability, where searching for a specific
item can be challenging.

2.1.3 Subband Entropy measure of visual clutter

The second model is the Subband Entropy measurement of clutter (Clutter SE). Rosen-
holtz, Li, & Nakano demonstrated that "Subband Entropy clutter measure is based on the
notion that clutter is related to the number of bits required for subband (wavelet) image
coding." When an image contains redundant components, it can be represented in a more
effective way, whether in the brain or in a computer[10]. For example, if a particular region
of one image forms a consistent group, that region can be encoded instead of encoding
each point, by identifying its group characteristics and its location [10].
To measure the Subband Entropy, a wavelet coder is utilized for image decomposition [10].
The wavelet coder breaks down the image into as set of subbands with distinct spatial
and orientation frequencies, which is similar to the decomposition that happen in human
vision [10]. Then, various image coder are applied such as JPEG and JPEG 2000 for the
purpose of transformation of the image with entropy encoding. Next step of measuring
Subband Entropy is to compute the Shannon entropy for each subband. Thereafter, it
2.1. MEASURING VISUAL CLUTTER: CONGESTION AND ENTROPY 9

sums up the subband entropy for the chrominance channels and the luminance; finally,
it calculate a weighted sum of luminance and chrominance to get the subband entropy
clutter.
2.2. STRUCTURAL SIMILARITY INDEX MEASUREMENT 10

2.2 Structural Similarity Index Measurement


The structural similarity index measurement (SSIM) measures the visual similarity be-
tween two images and is used for assessing the quality of images [18]. SSIM is a technique
that quantifies the similarity between two images, a reference image, and a test image,
respectively. The SSIM are measured using computational steps as it shown below. [1]

The Structural Similarity index measurement is based on measuring three key features:

1. Luminance difference: the amount of light emitted by a surface. Equation 1 indicates


luminance variabilities.
2. Constrast difference: level of variability between two colors. Equation 2 indicates
color differences.

3. Structure difference: variation in structure. Equation 3 indicates structure differ-


ences.

The Scikit-image Python library has been used to implement the SSIM [15]. SSIM algo-
rithm outputs a single value indicating how similar two consecutive frames are to each
other by taking in two consecutive frames with the same dimensions. In the case of two
frames with values 0, then the two frames differ completely, whereas 1 indicates that
the two frames are completely similar frames. After calculating all SSIM values for every
video, the single global score is calculated as shown in equation below.[1]

The equation below is used to calculate the adjacent score ADJSSI [1] :

It is possible to compute SSIM values on the extracted frames, or a video.


2.2. STRUCTURAL SIMILARITY INDEX MEASUREMENT 11

2.2.1 AUVANA: An Automated Video Analysis Tool for Visual Complexity

2.2.2 Introduction

Throughout history, visual complexity has dominated the field of visual perception and
has been a crucial concept. Video is a key component of visual complexity, yet there is no
established method for calculating it. To solve this problem, a program called (AUVANA)
was created that uses algorithms and computer vision to compute and visualize the vi-
sual complexity by analyzing indicators. The tool can be used by researchers to analyze
video complexity [1]. Researchers have studied visual complexity across a variety of ar-
eas, including cognitive science, marketing, psychology, human-computer interaction and
aesthetics[2] [8] [13].
The properties of visual complexity were originally designed to measure the complexity
of static images [11]. Also, the type and quantity of elements inside a stimulus, their
spatial arrangement, and the variety of colors inside it can influence how we perceive it
as more or less complex [7]. There are also multiple ways in which visual complexity can
be assessed, including pixel arrangements and semantic analyzing [4].
Quantitative and qualitative methods have been used to measure and analyze visual
complexity. Human judgment is used to assess visual complexity through qualitative ap-
proaches. However, there is a trade-off in everything, and this feature has the drawback
of being less consistent and more expensive and unable to be generalized to other op-
tical complexity stimuli. Meanwhile, a quantitative scale aims to provide numerical and
consistent measurements of visual complexity [1].
While many studies have been conducted on still images, few have been conducted on
video clips, which have a complex visual aspect. Visual complexity is difficult to measure
in video clips because the format changes continuously. The spatio-temporal aspects of
video clips are combined, which makes manipulating visual complexity difficult. In order
to understand the visual complexity of video clips, a number of dimensions and layers
need to be taken into account [1].

2.2.3 The development of AUVANA

It is AUVANA’s main mission to make video complexity accessible to everyone, especially


those who do not have programming abilities or experience. It is therefore designed to be
easy to use and to enable users to access the program locally without a connection to
the Internet. Processing videos and calculating visual complexity indices were carried out
using Python 3.8 and open-source libraries [1].
Through four levels of complexity, AUVANA calculates visual complexity. Several frames
are captured from just one operation of the camera during a shot. Every still frame repre-
sents an individual shot. However, Among those frames only one main frame represents
the best visual image[1].
There are three main components to the program. From the downloaded video, begin by
detecting the clip and its limits, then determine the number of frames present. A second
step is to calculate the video’s visual complexity indicators after extracting frames. Finally,
there is also a choice of bar chart and line chart for the results framework [1].
2.2. STRUCTURAL SIMILARITY INDEX MEASUREMENT 12

Module 1 Shot Boundary Detection and Frame Extraction

When the program starts, it will show users two possibilities, first, to stream the video
locally or provide video link from the Internet, and then ask the user to extract the frame,
where the frames are extracted, and the footage from the desired video is discovered.

Figure 2: The video upload interface on AUVANA[1].

Figure 3: Preprocessing videos to extract frames from them[1].


2.2. STRUCTURAL SIMILARITY INDEX MEASUREMENT 13

Figure 4: Select which indices to compute with [1].

Figure 5: Visualizes, and plots computed indices for analysis.[1]


2.2. STRUCTURAL SIMILARITY INDEX MEASUREMENT 14

Module 2: Extracting and Computing Indices

Second module evaluates visual complexity indexes according to levels of visual complex-
ity.

Table 1: Visual complexity measures are extracted at different levels.[1]

Visual clutter indices: Clutter is performance due to poor organization of the elements or
the way they are formed, and therefore the more cluttered the images, the more difficult
it is to find the desired visual element.[9]
For the algorithms to work, static images must be taken into account, so it has been
calculated the arithmetic mean and standard deviation of the visual clutter by extracting
key frames from each segment of a video.[14]
Getting things noticed and distinguishing them has always been critical for as long as
there has been information about color. The arithmetic mean and standard deviation are
calculated for the entire required video on AUVANA. The algorithm for calculating color-
fulness was constructed using Python and OpenCV.[1]
Module 3: Visualization In the third module, users are given the option to view complex
indices based on charts. The result can be displayed in various ways.
2.3. REFLECTIONAL VISUOSPATIAL SYMMETRY 15

2.3 Reflectional Visuospatial Symmetry


This research has been studying the issue of visuo-spatial symmetry in relation to natu-
ralistic stimuli in the visual arts, such as films, paintings, and photographs of landscapes
and buildings [12]. This model consists of a human-centered representation, as well as a
model for declarative, explainable interpretation that allows for deep semantic questions-
answering through the integration of methods for knowledge representation with deep
learning for computer vision[12].
"Visual symmetry as an aesthetic and stylistic device has been employed by artists across
a spectrum of creative endeavours concerned with visual imagery in some form, e.g.,
painting, photography, architecture, film and media design"[12]. According to the "visual
imagery" and "aesthetics" centered field of this study, symmetry has been a tool utilized
by artists since the ancient times with Giorgione, Titian, Raphael, Da Vinci, and with
Dali and other modern artists [12]. An important objective of this paper is to present a
computational model which can be used to create interpretable semantic interpretation
models for the analysis of visuo-spatial symmetry [12].

2.3.1 Human Evaluation: A Qualitative Study

A human evaluation experiment has been conducted on a data set consisting of 150 im-
ages. The experiment aims to determine the variance in the human perception of sym-
metry, as well as to consider how subjective human criteria for evaluating symmetry in
naturalistic images can reflects on their symmetry model [12]. The dataset contains differ-
ent images e.g. movie scenes, landscape, architectural photography, people, and nature.
The symmetry level in the dataset varies between low, medium, and high symmetry [12].
Approximately 100 individuals participated in the experiment; each participant evaluated
50 randomly selected images[12]. We have been using the same dataset with their respec-
tive values to be able to compare the human evaluation with other models.

1. Luminance
2. Color
3. Orientation

Based on the findings of the human experiment, it appears that perception of symmetry
differs greatly between individuals [12]. However, when there is no symmetry, people have
a tendency to agree, which means that the variance in the answers is very low. While in
the case of high symmetry, it appears to be a more variability in the human perception of
symmetry [12]. Researchers observed a wide range of aspects on the subjective evaluation
of symmetry:

1. absence of features reduces the evaluation of symmetry, for instance, figure 6a has
almost high symmetry, but there are just few features that can be symmetrical; there-
fore people evaluate it as medium symmetry with high variability in the answers.
2. symmetrical placement of people in an image has showed a higher effect on the
evaluation of symmetry than other objects, e.g., figure 6b is evaluated as symmetrical
relying on the placement of the characters with the door in the middle;.
3. Those images that are naturally symmetrical are considered to be less symmetrical
than those that are arranged symmetrically, e.g. figure 6c shows naturally symmet-
rical human face which is judged lower symmetry than other images with same rate
of symmetry on feature level [12].
2.3. REFLECTIONAL VISUOSPATIAL SYMMETRY 16

Figure 6: samples from the dataset. [12]


2.4. HUMAN-CENTRED VISUOSPATIAL COMPLEXITY 17

2.4 Human-Centred Visuospatial Complexity

2.4.1 Introduction

Researchers have invested heavily in self-driving vehicle research, but human-centered


design challenges still exist. Researchers have primarily focused on driver speed, vehicle
control, and which direction to turn when driving. It has barely been thought about how
humans interact with machines and how standards are developed. [5]
Developing a world where autonomous vehicles are more common and interact with
humans is essential, but it is not sufficient: control does not occur in real-time, and decision
making is not automatic. Diagnostics, universal design, etc., are additional factors that
must be considered.[5]

2.4.2 Multimodal Interactions

In this paper, a systematic methodology is developed. It is based on visual sensemaking


that standards for the development of multimodal interactions for autonomous cars are
developed.[5]
It has proposed a method for validating, testing, and evaluating computational models
of visual communication based on a visuospatial complexity cognitive model in daily
driving conditions.[5]
Visuospatial complexity is based on measurable roadside attributes based on structural,
dynamic, and quantitative attributes. The proposed methodology is driven by both visual
and spatial cognition methods and its human evaluation is based on both methods.[5]

2.4.3 Human Centred Model

Images and scenes contain different levels of detail and intricacy, which is referred to
as visual complexity.[11] A person acting in dynamic naturalistic scenes that are rich in
visual and spatial characteristics has been called visuospatial complexity. As well as color,
contrast, number of objects, etc., there are other factors that contribute to the perception
of the picture. Furthermore, the dimensions of the space, its structure, as well as the
relationships among its parts were analyzed.[5]
There are two types of visual attention, luminance and color are examples of exogenous
stimulus features that control the dynamical control of visual attention; whereas people
and objects that are endogenous are controllable via voluntary internal cognitive aspects
of the world.[3]
These factors may differ among themselves due to the large groups of elements that con-
tribute to the analysis for visual complexity. In order to obtain an analysis of the visual
complexity, it must first obtain a sufficient amount of databases for different types of im-
ages in addition to the data set and algorithms by means of some old experiments. There
are external factors that people can not control, and this is such as colors and lighting.
There are internal factors that one can control this such as people and objects.[5]
Therefore, there are many different levels of complexity from low complexity or high
complexity, and therefore it is important to create properties for the different levels so that
one can know the levels and their impact on human behavior and accuracy. To be able
to identify characteristics of visuospatial complexity, one needs to keep an eye on certain
daily activities such as driving or cycling and so on.[5]
There are three categories of attributes:
2.4. HUMAN-CENTRED VISUOSPATIAL COMPLEXITY 18

Quantitative Attributes

A scene contains a lot of information, such as perimeters and directions, as well as colors
and lighting and their influence on all these factors between a low level and a high level.
[5]
An overview of the properties:

1. Size: All dimensions perceived through the visual vision.


2. Clutter: Number of objects in the scene.

3. Colors: The number of colors in the scene.


4. The object: The number of objects in the scene.
5. Lighting: The amount of light in the stage.

6. Prominent: The amount of content that is present and prominent in the scene.
7. Similarity: The similarities between the perceptual, visuospatial information present
in the scene.

Structural Attributes

A representation of the relationship between elements in a scene, formed due to their


distribution throughout the scene, or their positioning in space. [5]
Here a look of the properties:

1. Repetition: The repetition of things in the scene, whether it is things or people, etc

2. Symmetry: All sorts of different movements


3. Organization: all possible levels of organization from the low organization to the
high
4. Similarity / differences: All kinds and shapes

5. Transparency: the difference between spaces

Dynamic Attributes

Visual attention studies reveal that both top-down and bottom-up cognitive processes
are affected by dynamic features of videos, such as motion and flicker. Considering the
dynamic aspects of scene analysis enhances prediction of outlook customization.[5]
Here a look of the properties:

1. Motion: Note moving objects and humans

2. Blinks: Flash any sudden changes in brightness over time


3. Velocity: changing on people and object in positions

As a result, visual attention is affected by visuospatial complexity and how it is defined,


manipulated, and measured, along with the type of stimuli used in the experiment.[5]
Chapter 3
Analysis and evaluation of
Visuospatial Complexity

Visual clutter was the first implemented measurement; it includes two models for mea-
suring visual clutter. The first model is Feature Congestion measurement of visual clutter,
this model considers three key features for measuring the clutter:

1. Luminance
2. Color
3. Orientation

The second model is Subband Entropy measurement; it measures the clutter using wavelet
coder for image decomposition. Then, it computes the subband entropy by computing
Shannon entropy for luminance and chrominance channels and sum up the values to get
the final value. Second measurement was the Structural Similarity index (SSIM). SSIM
computes the similarity between two frames considering three key features:

1. Structure
2. Luminance
3. Contrast

We implemented SSIM in two different ways. First model we implemented is AUVANA


software which involve several indices for measuring visual complexity; second, we im-
plemented the SSIM in Visual Studio Code using Python and the Scikit-Image library.
To be able to compare the Structural Similarity index measurement to symmetry model,
we splited each frame into two frames vertically e.g., split A into A1 and A2. The idea
behind splitting the frames vertically was that any naturalistic and dynamic scene is built
in a way that
Additionally, after splitting each frame, we flip A2 to become A3. The idea of the flipping
is to measure the SSIM for “A1A2” frame and “A1A3” frame, compare them, and evaluate
the differences.
The frames were splited and flipped using Python scripting and OpenCV library. Since
AUVANA software takes input as a video, we converted each splited frame to video, for
example, A1A2 frames become a 1fps video, and A1A3 becomes a 1fps video. All values
were normalized between 0 and 1 as scale.

19
3.1. REFLECTIONAL SYMMETRY AND CLUTTER 20

All computed values were provided in Excel. In Excel, we analysed the quantitative re-
sults by filtering, plotting, and visualizing the ratings between our models to see what
differences and similarities we found and draw a conclusion.

3.1 Reflectional Symmetry and Clutter


Description

In experiments 1,2 and 3, we used a dataset built of 150 images. The dataset we used is the
same dataset created and experimented in [17]. The dataset consists of movie, landscape,
architectural photography scenes. The goal of using this dataset is to evaluate and compare
the human perception and judgement of symmetry with other models. Furthermore, the
symmetry level in the dataset varies between low, medium, and high symmetry.

Figure 7: Samples from symmetry dataset


3.1. REFLECTIONAL SYMMETRY AND CLUTTER 21

Figure 8: Samples from symmetry dataset


3.1. REFLECTIONAL SYMMETRY AND CLUTTER 22

3.1.1 Experiment 1: Structural Similarity and Clutter

Description

For the first experiment, we implemented three models on same dataset: Sturctural Sim-
ilarity index, Feature Congestion measure of clutter, Subband Entropy measure of clutter
and human evaluation of symmetry. The goal of our first experiment was to measure vi-
sual complexity from different angels by implementing different models; compare them
to reveal their correlations and draw a conclusion.

Results

Below, there are some samples from the experimented dataset with their respective values.
Figure 9 depicts the extreme cases for Feature Congestion measurement. Figure 10 depicts
extreme cases for Subband Entropy measurement. Figure 11 depicts two extreme cases for
Structural Similarity index.

Figure 9: Minimum, medium, maximum clutter FC

Figure 10: Minimum, medium, maximum clutter SE


3.1. REFLECTIONAL SYMMETRY AND CLUTTER 23

Figure 11: Two extreme cases from SSIM and symmetry

Figure 12: Relationship between symmetry, SSIM, clutter FC , and clutter SE for the first
experiment

Discussion

In figure 9 frame A, it gets clutter FC as lowest value. As mentioned before, the clutter
FC relies on measuring three key features, orientation variability, luminance variability,
and colour variability. From the viewpoint of orientation variability, we notice that peo-
ple existing in the frame are positioned in a way that makes the orientation variability
very low. On the contrast, looking at figure 9 frame C which gets highest clutter FC, the
formation of water drops appears to form at random, which indicates a high orientation
3.1. REFLECTIONAL SYMMETRY AND CLUTTER 24

variability. From the viewpoint of luminance, looking at figure 9 frame A, the luminance
is low in general; the way light is defused makes many regions of the scene not to reflect
the light. Therefore, luminance variability is extreme low. With the maximum level of clut-
ter in frame C, obviously there is a huge variance of luminance between the water drops
and the surface they are on. Looking at frame A from the angle of colour, taking the light
diffusion into consideration, it appears that colour does not vary a lot. Particularly, the
reflection light from the faces and the background are similar to some extent, therefore
the scene has low colour variability. As a result, frame A gets minimum clutter FC and
frame C as maximum clutter FC.
Basically, the Subband Entropy measures the same characteristics as the feature conges-
tion, except for the orientation variance. The Subband Entropy measures luminance and
chrominance. By measuring the chrominance, the colour intensity is measured which
means it analyzes primarily the amount of high frequencies within color bands, the Sub-
band Entropy measure more clearly represents this idea (clutter-ref). So this we might
say, the same reasoning behind the measurement of features congestion of clutter is also
applicable when it comes to the measurement of subband entropy of clutter.
Regarding the structural Similarity index (SSIM), SSIM essentially measures luminance
difference, contrast difference and structure difference. Figure 11 frame G tends to be
somewhere between a low and a high value for both SSIM implemented using AUVANA
and by Python scripting. Looking at the structure of the two frames G1 and G2, we ob-
serve that G1 has different structure and edges than frame G2. Some regions are similar
between G1 and G2, but most regions have different structure. Hence, figure 11 frame G
is considered somehow medium SSIM and close to low SSIM. According to human per-
ception of symmetry, the symmetry of frame G is medium, which is in line with SSIM
measurements. From an angle of luminance and contrast, we can observe that a large re-
gion in G2 frame is dark. In contrast, frame G2 has a much higher luminance, including
the painting that reflects light. Thus, the luminance difference is between medium and
high, as well as the contrast which leads to medium SSIM for frame A.
Figure 11 frame H, which has a high SSIM for both SSIM measurements. Comparing H1
and H2, from structure view of point, there is just little difference as well as the luminance
and contrast. Luminance in H1 upper region differs a little from the upper region of H2.
As a result, H1 and H2 frame has low variability in structure, luminance and contrast
which means high SSIM. From the viewpoint of human perception, frame H was rated as
high.
Figure 12 containes all values for four different models implemented on the first dataset.
The implemented models are human perception of symmetry, Feature congestion, Sub-
band Entropy, and Structural Similarity index. As it seen in figure 12, we notice that all
values are getting higher or lower at the same time which indicates a positive correlation
between the four different models.
3.1. REFLECTIONAL SYMMETRY AND CLUTTER 25

3.1.2 Experiment 2: Structural Similarity: Mirroring vs. Non-Mirroring

Description

In this experiment we applied Structural Similarity index measurement on the splitted


frames (A->A1,A2, compare A1 to A2) and the flipped frames (flipp A2 to be A3, compare
A1 to A3), from the same dataset, to find out how the values changes.

Results

Below, there are some samples from the experimented dataset with their respective val-
ues. Figure 7 depicts minimum, medium and maximum values for SSIM on the Splitted
frames with their respective values for SSIM on flipped frames. Figure 8 depicts mini-
mum, medium and maximum values for SSIM on the flipped frames with their respective
values for SSIM on splitted frames

Figure 13: Minimum, medium, maximum for splitted SSIM


3.1. REFLECTIONAL SYMMETRY AND CLUTTER 26

Figure 14: Minimum, medium, maximum for flipped SSIM

Figure 15: Relationship between SSIM for splitted and flipped frames for experiment 2

Discussion

In experiment 2, we measured the Structural Similarity index for same frame when it is
splitted into two frames and when one of the splitted frames is flipped. Figure 12 depicts
extreme cases, minimum, medium, and maximum, for SSIM measurement. Observing
figure 13 frames A1 and A2 which have minimum SSIM, we can note that there is a sig-
3.1. REFLECTIONAL SYMMETRY AND CLUTTER 27

nificant variation in luminance and contrast, particularly in frame A2, where the windows
are reflecting the building in frame A1. Furthermore, there is also a great deal of varia-
tion in the structure between frames A1 and A2. For these reasons, the SSIM is fairly low.
When we flipped A2 to turn it into A3, we measured the SSIM for A1 and A3, and the
SSIM value was lower than the SSIM value for A1 and A2.
As can be seen in figure 13, frames B1 and B2 have a low SSIM value, while frames B1 and
B3 have a higher SSIM value due to the positioning of the character in the flipped frame
B3, which has a similar structure as the placement of the character in frame A1. As seen in
figure 13, frames C1 and C2 are the two frames with the highest SSIM value, while frames
C1 and C3 have also the highest SSIM value among the frames in which half of the frame
has been flipped on.
Figure 15 shows a chart containing all SSIM values for splitted and flipped frames. We can
clearly see that the SSIM values correlate positively, such that the SSIM values increase in
split frames, the SSIM values also increase in flipped frames, and vice versa.
3.1. REFLECTIONAL SYMMETRY AND CLUTTER 28

3.1.3 Experiment 3: Symmetry and Structural Symmetry (with mirroring)

Description

In this experiment we applied Structural Similarity index measurement on the flipped


frames to compare it with human perception of symmetry.

Results

Figure 16 shows the extreme cases for the symmetry with their respective SSIM-flipped
values.

Figure 16: Minimum, medium, maximum for symmetry


3.1. REFLECTIONAL SYMMETRY AND CLUTTER 29

Figure 17: Relationship between SSIM for flipped frames and symmetry for experiment 3

Discussion

In experiment 3, we aim to analyse the relationship between human perception of symme-


try and SSIM values for flipped frames. Figure 16 show extreme cases for human rating
of symmetry with their respective values for the flipped frames. Figure 16 shows frames
A1 and A3 with a medium SSIM value, whereas frame A indicates the lowest value for
human judgement of symmetry. As the symmetry value for frame B goes up, so does the
corresponding SSIM value for frames A1 and A3. Similarly, frame C is getting highest
value from human judgement of symmetry, as well as frames C1 and C3 which gets the
second highest value amongst SSIM values for flipped frames.
We conclude that values for human perception of symmetry and values for SSIM mea-
surement on flipped frames correlate positively, such that the symmetry values increase,
the SSIM values also increase for flipped frames, and vice versa as seen in figure 17.
3.2. EXPERIMENT 4: SYMMETRY AND CLUTTER IN NATURALISTIC DRIVING 30

3.2 Experiment 4: Symmetry and Clutter in Naturalistic


Driving
Description

For this experiment, we implemented two models on same dataset. The goal of our exper-
iment was to measure visual complexity in real driving scenes from different angels by
implementing different models; compare them to see how different models correlate, and
draw a conclusion.
The idea behind analyzing this type of dataset is that our project is a part of ongoing
research where driving in general and especially autonomous driving is central core such
that the visuospatial complexity plays an important role in everyday driving. It would be
an important step to understand the visuospatial complexity and to be able to draw some
conclusions about the similarities and the differences between these models to be able to
utilize them in future or ongoing researches.
Since the experimented dataset consists of 22 videos captured from different real-live
driving video scenes from different parts of the world , we used Python programming
language and OpenCV library to extract frames from each video. Total number of frames
extracted from the videos is 10214 frames. Furthermore, we divided each frame vertically
into two halves using Python scripting to be able to calculate the SSIM measurement by
using python script.

Figure 18: Samples from driving dataset


3.2. EXPERIMENT 4: SYMMETRY AND CLUTTER IN NATURALISTIC DRIVING 31

Figure 19: Samples from driving dataset

Results

The figures shows the extreme cases for the SSIM, clutter FC and clutter SE with their
respective values.

Figure 20: Minimum, medium, maximum for SSIM

Figure 21: Minimum, medium, maximum for clutter FC


3.2. EXPERIMENT 4: SYMMETRY AND CLUTTER IN NATURALISTIC DRIVING 32

Figure 22: Minimum, medium, maximum for clutter SE

Figure 23: Depicts how values for SSIM, clutter FC and clutter SE change during video
time for experiment 4
3.2. EXPERIMENT 4: SYMMETRY AND CLUTTER IN NATURALISTIC DRIVING 33

Figure 24: Depicts another experiment on driving dataset of how values for SSIM, clutter
FC and clutter SE change during video time

Discussion

As you can see in figure 20 image A, the left half of the image has a lot of light and the
right half does not have as much. This is because we have to divide the image into two
halves to count SSIM. Additionally, the left half of the structure has a lot of people while
the right half has only one, and the west side has more color than the higher side. As a
result, we get a minimum SSIM.
According to figure 20 image C we have max-SSIM so we see that if splitt the image
into two parts so still have the same colors in both halves the structure is next the same
especially if we look at the green sides light intensity and the contrast is the same through
two sides as well. SSIM is maximized as a result.
Clutter FC can be measured by taking into account three key building blocks. so we
explore them one by one. Starting with luminance, we see in figure 21, image A, that
the light on the right is significantly brighter than the light on the left. Moving on to
orientation, we see that people move towards the right in the same direction that the
image moves toward the left.
With max-clutter FC in image C in figure 21, we see that luminance is the same all over
the image, with no people, with make it less cluttered so orientation does not affect here,
so it’s even clearer. While color does vary, there is no difference in contrast.
In the measurement of subband entropy, luminance and chrominance are used to deter-
mine clutter. Luminance can be seen with the eye, but chrominance is more difficult to
observe, but makes a significant difference. This information indicates that in image A,
figure 22, we can see that the light is slightly different from the maximum clutter in image
C, figure 22.
The result of experiment 4 can be seen in figure 23, where it contains values for three differ-
ent models of measuring visual complexity: Structural Similarity index, Feature Conges-
tion and Subband Entropy measurement of clutter. The three models were implemented
on 744 frames captured from a driving scene; thus the values change with the video time.
As figure 23 shows, while SSIM gets higher clutter FC gets lower as well as clutter SE, and
3.2. EXPERIMENT 4: SYMMETRY AND CLUTTER IN NATURALISTIC DRIVING 34

vice versa. Therefore, we conclude that SSIM has a negative correlation to both Clutter
SE and FC. Figure 24 shows the same values and models as figure 23, but the models
were implemented on 470 frame taken from another driving video. Similarly, the negative
relationship between the three models is obvious in figure 24.
Chapter 4
Conclusion and Outlook

4.1 Review of Project Goals


The goal of this project is to analyse and evaluate different models of visuospatial com-
plexity using available static and dynamic scenes from real world situations. Two key
phases have been defined for the development of this project. The first phase consists
of applying different models of visuospatial complexity measurement to two different
datasets. Second phase is analyzing the results for each model, compared with other
models, through four experiments, and therefore drawing conclusions. Based on our four
empirical experiments we came up with one conclusion for each experiment as discussed
in Discussion chapter.
We have successfully evaluated two different models of measuring specific aspect of com-
plexity, specifically Structural Similarity and Clutter. Our results provide a basis for un-
derstanding the effect of different features of the scene on visual perception, and the
interactions between these features (e.g. the structure and the clutter).

4.2 Context
This project can provide a starting point for researchers on visuospatial complexity mea-
surements such that it provides a clear comparison between different models of measur-
ing visuospatial complexity and how they correlate to each other. Additionally, it provides
solutions on different fields such as human-computer interaction, website designing, mar-
keting, and understanding human visual system. In the field of human-computer interac-
tion where the industry of autonomous driving has developed a lot; human requirements
should be very carefully considered during the design of autonomous driving. An au-
tonomous driving car must be able to analyse the surrounding scene and take actions
relying on sensors and cameras. Regrading websites designers, huge efforts are made to
design a website where one can easily navigate between different sites. This project can
help website designers to consider the similarities and differences between different mod-
els of visuospatial complexity in order to provide user-friendly interfaces.

35
4.3. FUTURE WORKS 36

4.3 Future Works


A natural next step of this work will be to conduct a human evaluation study for the
video dataset of the driving scenes. This will provide the possibility to compare the re-
sults from the model-based evaluation (SSIM, Clutter) with human’s understanding of
visuospatial complexity for the same stimuli. As we saw from the same kind of compar-
ison in the symmetry focused dataset, human evaluation has a positive correlation with
other models(SSIM, Clutter).
As a matter of interest, we consider this to be a part of a future phase of work. Except for
the models which were implemented in our project, there are several models for measur-
ing visuospatial complexity out there (e.g, analysis of texture characteristics, predicting
beauty); it would be valuable to test other models for measuring visuospatial complexity
from different viewpoints, such as roughness, density and deep intermediate-layer fea-
tures. In this way, it’s possible to position it forward and look at it from a more holistic
angle.
4.3. FUTURE WORKS 37

Appendix

Table 2: Describes the experimented datasets.


References

[1] Emad Alghamdi, Eduardo Velloso, and Paul Gruba. Auvana: An automated video
analysis tool for visual complexity. 2021. (Cited on pages 10, 11, 12, 13, and 14.)
[2] Julia Braun, Seyed Ali Amirshahi, Joachim Denzler, and Christoph Redies. Statistical
image properties of print advertisements, visual artworks and images of architecture.
Frontiers in Psychology, 4:808, 2013. (Cited on page 11.)
[3] Marisa Carrasco. Visual attention: The past 25 years. Vision research, 51(13):1484–1525,
2011. (Cited on page 17.)
[4] Christopher Heaps and Stephen Handel. Similarity and features of natural textures.
Journal of Experimental Psychology: Human Perception and Performance, 25(2):299, 1999.
(Cited on page 11.)
[5] Vasiliki Kondyli, Mehul Bhatt, and Jakob Suchan. Towards a human-centred cognitive
model of visuospatial complexity in everyday driving. arXiv preprint arXiv:2006.00059,
2020. (Cited on pages 5, 17, and 18.)
[6] Jitendra Malik and Pietro Perona. Preattentive texture discrimination with early vi-
sion mechanisms. JOSA A, 7(5):923–932, 1990. (Cited on page 8.)
[7] Letizia Palumbo, Ruth Ogden, Alexis DJ Makin, and Marco Bertamini. Examining
visual complexity and its influence on perceived duration. Journal of vision, 14(14):3–3,
2014. (Cited on page 11.)
[8] Rik Pieters, Michel Wedel, and Rajeev Batra. The stopping power of advertising:
Measures and effects of visual complexity. Journal of Marketing, 74(5):48–60, 2010.
(Cited on page 11.)
[9] Ruth Rosenholtz, Yuanzhen Li, Jonathan Mansfield, and Zhenlan Jin. Feature con-
gestion: a measure of display clutter. In Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 761–770, 2005. (Cited on page 14.)
[10] Ruth Rosenholtz, Yuanzhen Li, and Lisa Nakano. Measuring visual clutter. Journal of
vision, 7 2:17.1–22, 2007. (Cited on pages 7 and 8.)
[11] Joan G Snodgrass and Mary Vanderwart. A standardized set of 260 pictures: norms
for name agreement, image agreement, familiarity, and visual complexity. Journal of
experimental psychology: Human learning and memory, 6(2):174, 1980. (Cited on pages
11 and 17.)
[12] Jakob Suchan, Mehul Bhatt, Srikrishna Vardarajan, Seyed Ali Amirshahi, and Stella
Yu. Semantic analysis of (reflectional) visual symmetry: A human-centred computa-
tional model for declarative explainability. Advances in Cognitive Systems, Vol 6:65–84,
2018. (Cited on pages 15 and 16.)

38
REFERENCES 39

[13] Alexandre N Tuch, Javier A Bargas-Avila, Klaus Opwis, and Frank H Wilhelm. Visual
complexity of websites: Effects on usersâ experience, physiology, performance, and
memory. International journal of human-computer studies, 67(9):703–715, 2009. (Cited
on page 11.)
[14] Gerrit Van der Veer. Proceedings of the SIGCHI conference on Human factors in computing
systems. ACM, 2005. (Cited on page 14.)
[15] Anliang Wang, Xiaolong Yan, and Zhijun Wei. Imagepy: an open-source, python-
based and platform-independent software package for bioimage analysis. Bioinfor-
matics, 34(18):3238–3240, 2018. (Cited on page 10.)
[16] Andrew B Watson. Visual detection of spatial contrast patterns: Evaluation of five
simple models. Optics Express, 6(1):12–33, 2000. (Cited on page 8.)

You might also like