2023 HCIIV Evaluation02

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

User Evaluation 2

Introduction
This is the second user evaluation for the Smart Home application that we have been
building. The evaluations took place between the 16th and 18th of December. As mentioned
in the first user evaluation we aim to make the application usable for a wide range of users.
Therefore, we again had 5 participants 20, 22, 57, 67 and 73 years of age. As a reminder,
the following functionalities have been implemented in our application:
- switching devices on and off;
- adjusting the color of the lights;
- adjusting the volume of the soundbar;
- adjusting the channel of the TV;
- reading the temperature of the radiators;
- adjusting the temperature of the fridge;
- switching between floors.
These functions have been used for the second round of testing. Based on the feedback
from the first user evaluation, we have made some changes to the prototypes, which we will
now discuss.

User feedback 2D
After the first user evaluation, we
received some feedback to improve
the 2D and 3D versions of our
application. For the 2D version, the
text was made bigger so it is now
easier to read. In addition, in the first
version, the text was not aligned and
it all looked a bit weird. However, the
functionality was there and therefore
we continued to use that prototype
for the first user evaluation. As
expected the users recommended
us to improve the layout and style of
the list of devices. This has now
been addressed and for the second
user evaluation, there were no more
complaints about this. Based on the
feedback from the first user
evaluation, we have also added
some text labels to the button used
to change the light color. The
changes can be seen in Figure 1.
Finally, we also implemented a spin
box to switch channels.

Figure1: Figure
Figure SEQ 1: Figure \* ARABIC
2D version 1: 2D version
visual improvements withvisual improvements
prototype 1 on the left and the
with prototype
improved 1 on
version on the the left and the improved version on the right.
right.

1
User feedback 3D
For the 3D version of our application, we also improved some things based on user feedback.
Firstly, we added a legend, so the user will be able to categorize the different devices more
quickly. Secondly, in the 2D version, the temperature of the rooms is easily readable because
it is on the screen and you do not have to click on something to display it. This made the task
“Read the temperature of the radiator in the living room” unfair because it was significantly
easier to do in the 2D version. Therefore, we created a section where you can see the
temperatures of the different floors. The changes can be seen in Figure 2, in the top left corner
the temperature is now displayed and the legend is below it.

Figure
Figure 2:: 3D version visual SEQ Figure
improvements \* ARABIC
with prototype 1 on2:the
3Dleftversion
and the visual improvements
improved with prototype 1 on the
version on the right.
left and the improved version on the right.
Research question
As mentioned in our project plan, our research question is as follows: What is the user
friendliness of 3D versus 2D visualizations for regulating smart hardware around the house?
We hope to be able to answer this question after analyzing the results of this second user
evaluation. The time it takes to perform some fundamental tasks provides us with some solid
insights on the user friendliness of both applications, especially when discussing the efficiency
and learnability. In addition to this, the results of the SUS posttest questionnaire will allow us
to get a grasp of the attitude of the participants towards both applications.

Instructions
To test our prototypes we used the following tasks to test the participants. Compared to the
previous test we removed the 4th task and added the 10th task in order to create a bit more
complexity. In contrast to the first user evaluation, during this user evaluation, we let the users
explore the 2D and 3D versions of the applications in order to familiarize them with the house
and its devices. We did this in order to recreate a more realistic testing scenario. Real users
would of course be familiar with the layout of their house and the placement of their devices,
so it makes sense to allow the participants to explore first. After this initial period of exploration,
we let the participants perform the tasks and timed each of them, just like during the first user
evaluation. We also had them fill out the SUS post survey questionnaire, after they were done
performing the tasks of each version. The following section will cover the results of the second
user evaluation.

2
Prototype testing tasks user evaluation Prototype testing tasks user evaluation
2: 1:
1. Turn off the light in the hall 1. Turn off the light in the all
2. Read the temperature for the living 2. Read the temperature for the living
room room
3. Set the lamp in the corner next to 3. Set the lamp in the corner next to
the television's brightest setting. the television's brightest setting.
4. Go to the second floor 4. Put the temperature of the
5. Put the television on channel 3. refrigerator 2 degrees lower.
6. Go to the third floor. 5. Go to the second floor.
7. Change the colour of the bathroom 6. Put the television on channel 3.
light to red.
8. Go back to the first floor. 7. Go to the third floor.
9. Set the volume of the soundbar to 8. Change the colour of the bathroom
75%. light to red.
10. Turn the kitchen ceiling light to the 9. Go back to the first floor.
maximum brightness and make the 10. Set the volume of the soundbar to
colour of the light blue. 75%.
Table 1.1: Prototype 2 testing tasks Table 1.2: Prototype 2 testing tasks

Results
In the table below the results of the testing can be seen. We again started the timer when we
were finished with reading the questions and stopped when the task was completed. We once
again have to mention that there is a form of bias present in testing. This is because the
participants are known by the researchers and therefore could give different scores in regards
to strangers. However, since this is known by the researchers its limitations are reduced.
Furthermore, because of time constraints we are not able to find 10 individuals in different age
categories in time. We thus had to settle for a less equally distributed age range then we would
have preferred.

Below are the average times for user evaluation 1 and 2 for the 2D and 3D version. It must be
noted again that the tasks have slightly changed from the last user evaluation and therefore
the times cannot be directly compared with each other.

3
Time Task Measured Current Worst Planned Best
2D/3D

Initial 1 Time on 4,4 s / 2,2 10 s 5s 2s


performance first trial s

Initial 2 Time on 7 s / 4,2 20 s 10 s 3s


performance first trial s

Initial 3 Time on 9,6 s / 7,8 20 s 10 s 3s


performance first trial s

Initial 4 Time on 4,6 s / 8s 4s 2s


performance first trial 2s

Initial 5 Time on 7,4 s / 8,4 25 s 15 s 5s


performance first trial s

Initial 6 Time on 3,4 s / 8s 4s 2s


performance first trial 2,4 s

Initial 7 Time on 7s/ 20 s 10 s 3s


performance first trial 8s

Initial 8 Time on 3s/ 8s 4s 2s


performance first trial 2,2 s

Initial 9 Time on 5,2 s / 9 20 s 10 s 3s


performance first trial s

Initial 10 Time on 12 s / 20 s 10 s 3s
performance first trial 12 s
Table 2: Results of testing round 2 (time measured in seconds).

4
Time Task Measured Current Worst Planned Best
2D/3D

Initial 1 Time on 10,5 s / 3,3 10 s 5s 2s


performance first trial s

Initial 2 Time on 5,4 s / 13,5 20 s 10 s 3s


performance first trial s

Initial 3 Time on 10,1 s / 7,1 20 s 10 s 3s


performance first trial s

Initial 4 Time on 7,1 s / 20 s 10 s 3s


performance first trial 12,4s

Initial 5 Time on 12,3 s / 5,2 8s 4s 2s


performance first trial s

Initial 6 Time on 14,9 s / 25 s 15 s 5s


performance first trial 16,6 s

Initial 7 Time on 4,8 s / 8s 4s 2s


performance first trial 4,2 s

Initial 8 Time on 9,8 s / 20 s 10 s 3s


performance first trial 8,4 s

Initial 9 Time on 2,6 s / 3,3 8s 4s 2s


performance first trial s

Initial 10 Time on 5,6 s / 20 s 10 s 3s


performance first trial 9,9 s
Table 3: Results of testing round 1 (time measured in seconds).

5
Figure 3: Average time to perform prototype test tasks round 2 (measured in seconds).

Figure SEQ Figure \* ARABIC 3: Average time to perform prototype test tasks
round 2 (measured in seconds).

Figure 4: Average time to perform prototype test tasks round 1 (measured in seconds).

6
After performing each set of tasks, we let the participants fill out a SUS questionnaire in which
they had to answer several questions about the prototype they just tested. In addition, they
could write down additional comments, which could be either positive or negative feedback. A
copy of this questionnaire is provided in the appendix of this paper (Appendix A).
The combination of answers can be used to calculate a score. The results of the survey from
the most recent user evaluation can be seen in Table 3.1 and the results of the first one in
Table 3.1.

SUS Post-Test Survey scores


evaluation 2
2D Version 3D Version
85 points 57,5 points
85 points 90 points
87,5 points 92,5 points
92,5 points 92,5 points
92,5 points 100 points
Total: 442,5 Total: 432,5
Average points per person: 88,5 Average points per person: 86,5
Table 3.1: SUS Post-Test Survey results user evaluation 2.

SUS Post-Test Survey scores


evaluation 1
2D Version 3D Version
70 points 67.5 points
72,5 points 72,5 points
87,5 points 72,5 points
87,5 points 87,5 points
97,5 points 95 points
Total: 415 Total: 395
Average points person 83 Average points person: 79
Table 3.2: SUS Post-Test Survey results user evaluation 1.

Discussion
When analyzing the results of the second user evaluation and comparing them to those of the
first user evaluation, some interesting things can be found. Although we cannot compare all of
the results directly, as some tasks have been adjusted, on average the tasks were performed
a lot faster in both versions. Additionally, there was a larger difference between the two
versions. In our first experiment, 6 out of the 10 tasks were performed faster in the 2D version.
This time, however, this was only the case for 3 out of 10 tasks. When comparing the time it
took to perform a single task, on average, each task was performed about 0,5 seconds faster
in the 3D version. During the first user evaluation, this difference was less than a tenth of a
second. It is safe to say that, based on the results of our experiment, the tasks could be
performed faster in the 3D version than in the 2D version. But what caused the results to differ
so much during this second user evaluation? This is probably due to the change in our testing
approach. During the first experiment, we let the user perform the tasks right after launching
the app. Although this gave us a good impression of the intuitiveness of both applications -
which has definitely led to some solid improvements in both prototypes - it was not the most
realistic use case. A real user would be familiar with the layout of their own house and the
placement of their smart devices. For this second user evaluation, we therefore allowed the
users to first get familiar with the house and its devices, before having them perform the tasks.
This alteration in our modus operandi is most likely to have caused this change in results.

7
While observing the participants perform the tasks, there were a few things that stood out to
us. Performing a task generally consists of a few steps the user has to interpret the task,
convert the task into actionable steps, and locate the interface element to interact with and
only then can the user actually perform the intended task. Since we have aimed to keep the
functionality of the 2D and 3D versions practically the same, the time it takes to perform the
other stages is more or less identical for both versions (although some additional clicks are
required in either the 2D or 3D version in some cases). However, locating the element of the
user interface which is to be interacted with is where we noticed the most substantial difference
between the two versions. For some tasks, like reading the temperature or switching floors,
the difference was minimal, as the interface elements were easy to locate in both versions. For
the tasks where the user had to change something for a specific device, though, the
participants were often able to locate the device faster in the 3D version. Of course, this was
not always the case, as sometimes the participants were able to locate a device in the list really
quickly, but in general, they had an easier time finding it in the 3D version.

As for the difference between the different ages, there was less variation in the time it took
them to perform the tasks. The 73-year-old participant was the exception in this case, as it took
them significantly longer to perform the tasks than the other participants in both versions,
averaging around 12 seconds per task, compared to about 4 seconds per task for the rest of
the participants.

During the first test, the age of the participants seemed to have a substantial impact on the
time it took them to perform the tasks in both versions. As this was not the case for the second
user evaluation, this might suggest that, although the app might not be as intuitive to people
of different ages, when given the time to familiarize themselves with the application, most users
will become quite efficient with it. Furthermore, there did not appear to be a correlation between
participant’s age and the version in which they were able to perform the tasks the fastest, as
this differed throughout the age ranges in both the first and second user evaluations.

Usability is not all about efficiency, though. It is also important to take the subjective opinions
of the users into account. In our case, this is where it gets harder to draw any decisive
conclusions. First of all, we want to note that the average score, based on the SUS posttest
survey, has improved significantly for both the 2D and the 3D versions of the application.
Although this could very well be the result of having new participants, the improvements in our
prototypes are likely to have also been a reason for this increase in score. During the first user
evaluation, the average score for the 2D version was 83 and for the 3D version, this was 79.
Noteworthy, is that all the participants scored the 2D version either just as good or better than
the 3D version.

For the second user evaluation, the scores are quite different. On average, the score for the
2D version was 88,5 and for the 3D this was 86,7. However, unlike during the first user
evaluation, 4 out of 5 participants scored the 3D version higher. The 3D version still has a
lower score due to a single participant. They gave the 2D version a score of 85 points and the
3D version a score of 57,5 points. This makes it a little more difficult for us to draw a decisive
conclusion. If we take the average of the scores, the 2D version would be the winner, but when
assigning points based on individual comparisons between the scores assigned to the 2D and
3D versions by the participants, the 3D version would be the clear favourite. The participant
who rated the 3D version relatively poorly indicated that they simply did not like the 3D version
and preferred the 2D version.

8
So are we able to draw any conclusions after the two user evaluations? As of now, the results
do suggest that the usability of the 3D version is better than the usability of the 2D version.
Due to the relatively small number of participants, however, the result may differ when
performing a third or fourth user evaluation. For now, though, the efficiency of the 3D version
seems to be slightly higher than the efficiency of the 2D version and the attitude towards the
3D version was more positive for the majority of the participants during the second user
evaluation. In terms of learnability, it is not yet obvious which version is superior. Based on the
first user evaluation the 2D version seems to be preferred when the users are thrown into the
deep end, but after having just a few minutes to get familiar with the layout of the house, the
participants found the 3D version to be more straightforward and were able to locate the
devices more easily.

Based on this most recent user evaluation, there are no major issues in our current prototypes.
All of the functions seem to be working as intended and we are content with the overall
functionality. We have received some suggestions for additions to the application, like being
able to turn all the lights off at the same time or saving a set of favourite colours, but these all
seem like features we would like to add at some point, but are not that important for our
prototype. Besides, this would not help us to better understand the difference in usability
between the 2D and 3D versions of the app.

9
Appendix
Appendix A: SUS questionnaire

10

You might also like