Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

User Evaluation 1

Introduction
For our first user evaluation our goal was to have possible users experiment with both the 2D
and 3D version of our app, so we could measure the time it took them to perform some
rudimentary tasks, as well as provide us with some feedback on what could be improved upon
for the next prototype. The evaluations took place between the 3rd and 6th of December 2023.
Since our application has a wide potential target audience, we wanted to test the app with
users of multiple age categories. The app might not be suitable for users of every age, since
we can imagine older people could have a harder time navigating a virtual 3D environment,
but this is exactly what we want to find out. For this first prototype, we thus had 5 participants
of 23, 36, 48, 57 and 71 years of age.

For the first user evaluation, we had both a working 2D and 3D prototype. We have tried to
keep the functionality of both apps as similar as possible, to prevent this from interfering with
our results. For both versions, the following functionality was implemented:
- switching devices on and off;
- adjusting the colour of the lights;
- adjusting the volume of the soundbar;
- adjusting the channel of the TV;
- reading the temperature of the radiators;
- adjusting the temperature of the fridge;
- switching between floors.
There were some functions, like adjusting the power of the individual stoves, that were only
implemented in the 3D version, but we did not include these as tasks the users had to perform.

Prototype testing tasks:


Instructions 1. Turn off the light in the hall.
2. Read the temperature of the
To test our prototypes we created several tasks which radiator in the living room..
the participants had to complete. We created 10 tasks 3. Set the lamp in the corner next
that the participants had to carry out in the 2D and the to the television to its brightest
3D version of the app. Since we wanted to gauge the setting.
intuitiveness of the prototype, we only gave the users a 4. Put the temperature of the
brief explanation. They then had to start immediately, so refrigerator 2 degrees lower.
they did not have time to familiarise themselves with the
application and had to use their existing mental model 5. Go to the second floor.
in order to perform the tasks. We timed each participant 6. Put the television on channel 3.
to see how long it took them to carry out the individual 7. Go to the third floor.
tasks to be able to compare both versions and the 8. Change the colour of the
different participants. We created the following tasks for bathroom light to red.
the participants for which time was measured in 9. Go back to the first floor.
seconds. 10. Set the volume of the soundbar
to 75%.

Table 1: Prototype 1 testing tasks.

1
Results
In the table below the results of the experiment can be seen. As mentioned before, we had
the users start immediately with the tasks they had to perform and we did not provide them
with time to familiarise themselves with the applications. We started the timer when we were
finished with reading them the questions and stopped when the task was completed. We did
not help them in any way during the task.

Time Task Measured Current Worst Planned Best


2D/3D

Initial 1 Time on 10,5 s / 3,3 10 s 5s 2s


performance first trial s

Initial 2 Time on 5,4 s / 13,5 20 s 10 s 3s


performance first trial s

Initial 3 Time on 10,1 s / 7,1 20 s 10 s 3s


performance first trial s

Initial 4 Time on 7,1 s / 20 s 10 s 3s


performance first trial 12,4s

Initial 5 Time on 12,3 s / 5,2 8s 4s 2s


performance first trial s

Initial 6 Time on 14,9 s / 25 s 15 s 5s


performance first trial 16,6 s

Initial 7 Time on 4,8 s / 8s 4s 2s


performance first trial 4,2 s

Initial 8 Time on 9,8 s / 20 s 10 s 3s


performance first trial 8,4 s

Initial 9 Time on 2,6 s / 3,3 8s 4s 2s


performance first trial s

Initial 10 Time on 5,6 s / 20 s 10 s 3s


performance first trial 9,9 s
Table 2: Results of testing (time measured in seconds).

2
Figure 1: Average time to perform prototype test tasks (measured in seconds).

At first sight it might appear like that there is a significant amount of differences in the data.
However, it must be noted that there are some outliers in the data that influences the results.
For example, not being able to find a certain function with the 2D or 3D version which leads to
higher average times. We are aware that this is a problem and therefore we keep this in mind
when analysing the data. After the testing we let the participant fill out a SUS questionnaire in
which they had to answer several questions about the prototype they just tested. In addition,
they could write down additional comments, which could be either positive or negative
feedback. A copy of this questionnaire is provided in the appendix of this paper (Appendix A).

Discussion
When comparing the time it took to perform certain tasks in the 2D version versus the 3D
version of our application, there were some interesting results. When the users know exactly
what device is meant in the task description and where this device is located, they seem to be
able to find it faster in the 3D version. However, since the users were not yet familiar with the
layout of the house and its devices, in some cases, the text description in the 2D version was
easier to recognise, leading to a faster time.

We also noticed tasks like switching between floors took some time to figure out the first time,
but were performed increasingly faster each time. Even though it was interesting to gauge the
intuitiveness of both versions, for the second user evaluation we want to change our approach
to simulate a more realistic use case.

3
A real user would be controlling the devices in their own house, instead of the arbitrary devices
in this prototype version. They would thus have a better understanding of what devices there
are and where they are located in the house. As this is likely to alter the results of our
experiment, for the next user evaluation, we want to provide the users with some time to get
familiar with the house. They will have some time to look around and interact with the different
devices. After this, they will yet again perform a series of tasks and we will measure the time
it takes to perform these.

We also noticed that the task of reading the temperature of the central heating system was
not a fair comparison. In the 3D version the users had to first locate the radiator and then right-
click it, before being able to read the temperature, whilst in the 2D version the temperature
was probably one of the first things the user noticed, as this is quite an eye-catching panel in
the application and the user did not have to interact with the app in order to read it. From the
post-test survey we also noticed that the 3D implementation of the temperature adjustment
was not preferred. To solve this, we will be adding a panel on the left side of the screen,
showing the current temperature for each floor. We will also remove the task of reading the
temperature from the task list, as we do not feel like this will add much to our second user
evaluation.

There was an interesting correlation between the age of the users and the time it took to
perform the tasks. As expected, in general, the older users required more time to perform the
tasks, but the older users still managed to navigate around the 3D environment quite well and
the measured times were mostly better than our worst case acceptable scenario. Whether the
2D or 3D version was faster in use differed quite significantly between the users, and thus
based on these measurements a direct conclusion on which version has better usability can’t
yet be drawn. We hope the aforementioned alterations in our experiment approach will lead
to more concrete results after the second user evaluation. We were, however, able to find
some interesting results based on the SUS post-test survey.

After conducting the first round of evaluations the participants gave the following SUS Post-
Test scores:
SUS Post-Test Survey scores
2D Version 3D Version
70 points 67.5 points
72,5 points 72,5 points
87,5 points 72,5 points
87,5 points 87,5 points
97,5 points 95 points
Total: 415 Total: 395
Average points person 83 Average points person: 79
Table 3: SUS Post-Test Survey results.

What can be seen is that the average score for the 2D version is higher than the 3D version
but the difference is minimal. On average the 2D version also scorers higher but this is also a
small difference. Since the participants were individuals that are known by the researchers a
bias might be present in the scoring. The participants might gave higher scores knowing that
the researchers created the prototypes instead of random individuals that they might not know.
Because of this we will be conducting the second user evaluation with different participants.

4
Beside this it is too soon to give any conclusions on what version of the SmartHome app works
best. Statistically speaking the 2D version is better than the 3D version because of its higher
score. However, since the sampling population is rather small we will need to do another round
of testing to give a more accurate conclusion. In addition, there were more improvements
needed for the 2D version than the 3D version.

Based on some of the feedback we received in the SUS post-test survey, we will be making
some adjustments in our prototype before the next user evaluation.
For the 2D version, we will be increasing the font size of the floor selecting line, as especially
the older users had a difficult time locating in the first instance. We will also adjust the input
field of the TV channel to better suit the input field used in the 3D version, as some users
confused it for a volume bar. Furthermore, we want to make sure all of the parts of the interface
align nicely, to make for a more cohesive experience.
For the 3D version, as mentioned before, we will add a panel which states the current
temperature for each room, in order to close the gap between the 2D and 3D version in this
aspect. We will also add a legend, which will indicate what the different colours of the devices
mean (e.g. lights, entertainment, etc.).

Overall we are content with our first user evaluation. We were able to find some nice insights
on what we might be able to improve for our second prototype and perhaps most importantly,
on how we can change our approach for the second evaluation in order to better answer our
research question.

5
Appendix
Appendix A: SUS questionnaire

You might also like