ASSETS 2022 Technical Papers - Submission 8747: Matt - Huenerfauth@rit - Edu

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

#8747 "Support in the Moment: Benefits and use of ...

" 4/14/22, 5:11 PM

Submissions Reviews Account sign out

ASSETS 2022 Technical Papers - submission 8747

contact : Matt Huenerfauth (matt.huenerfauth@rit.edu)

Support in the Moment: Benefits and use of video-span selection and search for sign-language video comprehension among ASL
learners

Authors

Saad Hassan
Computing and Information Sciences, Rochester Institute of Technology, Rochester, New York, United States, sh2513@rit.edu
Akhter Al Amin
Computing and Information Sciences, Rochester Institute of Technology, Rochester, New York, United States, aa7510@rit.edu
Caluã de Lacerda Pataca
Computing and Information Sciences, Rochester Institute of Technology, Rochester, New York, United States, cd4610@rit.edu
Diego Navarro
National Institute for the Deaf, Rochester Institute of Technology, Rochester, New York, United States, cd4610@rit.edu
Alexis Gordon School of Information, Rochester Institute of Technology, Rochester, New York, United States
Sooyeon Lee School of Information, Rochester Institute of Technology, Rochester, New York, United States
Matt Huenerfauth
School of Information, Rochester Institute of Technology, Rochester, New York, United States, matt.huenerfauth@rit.edu

Abstract

As they develop comprehension skills, American Sign Language (ASL) learners often view challenging ASL videos, which may contain
unfamiliar signs. Current dictionary tools require students to isolate a single sign they do not understand and input a search query, by
selecting linguistic properties or by performing the sign into a webcam. Students may struggle with extracting and re-creating an
unfamiliar sign, and they must leave the video-watching task to use an external dictionary tool. We investigate a technology that enables
users, in the moment, i.e., while they are viewing a video, to select a span of one or more signs that they do not understand, to view
dictionary results. We interviewed 14 American Sign Language (ASL) learners about their challenges in understanding ASL video and
workarounds for unfamiliar vocabulary. We then conducted an observational study to investigate 8 ASL learners' interaction with a
Wizard-of-Oz prototype during a video-comprehension task, revealing benefits of an integrated search tool and use of span-selection to
constrain video play. A comparative study with 7 additional ASL learners using a baseline video player and existing ASL dictionary
website revealed benefits of our tool in terms of quality of video translation produced and perceived workload to produce translations.
These findings inform future designers of such systems, computer vision researchers working on the underlying sign matching
technologies, and sign language educators.

Short or Long Paper

long

The Document in PDF

The file (5.8 MB)

The Document Source

https://new.precisionconference.com/assets22a/author/subs/8747 Page 1 of 3
#8747 "Support in the Moment: Benefits and use of ..." 4/14/22, 5:11 PM

The file (20.2 MB)

Accessibility of figures/Alternative text for all figures and tables

Figure 1: The figure shows a screenshot of the prototype with several parts labeled. The main video player is on the right side of the
screen. The title of the video appears at the top right corner of the screen. Underneath the video-player is a text entry field with white
background where users can add translations. On the top right corner of the screen there is a “Next Video” button. Underneath it is a
horizontal “Search selection” button. There is text underneath “Search selection” button that shows the number of results and the start
and end timestamps of the selected span. There is also text underneath which says “Your selection has changed. Update results?” which
appears when users change the span. Underneath this text are video results. The closest English gloss is displayed underneath each
video. There is also an “i button” which users can click. This displays linguistic features of the sign in the video as an overlay on top of it.
In the image top two results and three-quarters of two more are showing. There is a slider on the right side that the users can click to
view the rest of the results. At the bottom of the screen there is a span selection bar which shows some frames of the video. There is a
rectangular yellow span selector on it. The timestamps at span boundaries are displayed. There is also a vertical white bar which
indicates where the video is at. At the bottom left corner of the screen there is a “Play button” which has the traditional triangular play
icon on it. Next to it is the “Play selection button.

Figure 2: Two plots of span durations selected by participant 5 for a Theatre and a Conversation video. The y-axis has the title
“Sequence of subspans over time” and the x-axis is titled “Position of Sub-span on the Video Timeline (s).For (a), the graph is entitled
“Participant: 5 | Video Name: Theatre 3,” and the horizontal lines are approximately between 4 to 10 seconds long, and the ends slightly
overlap, as the spans begin progressively later in the video from beginning to end. For (b), the graph is entitled “Participant: 5 | Video
Name: Conversation 2,” and the horizontal lines are in a similar slightly overlapping pattern from beginning to end of the video, in a
diagonal progression.

Figure 3: The figure shows two screenshots of the prototype used in study 2. The figure on the left has the following text written in the
text field “Class starts. Reach out to many different interpreters and deaf people who are very skilled at asl. They give books to learn
from”. The image on the right side has the following text written in the text field “Class starts. Reach out to many different interpreters and
deaf people who are very skilled at asl. They give books to learn from and figure out diffe”. The span selected has time stamps 11.5 and
12.7. There is text on the top right hand corner of the screen which says “22 results between 11.5 and 12.7 seconds”. There are two
results fully visible. There is a video of sign for “Look” and “Figure-out” with English gloss labels at the bottom of each video. Eye gaze of
the participant is on the two results on the left figure and in the text-field on the right figure.

Figure 4: Figure 4 (a) shows plots of span durations selected by participant 1 for a Conversation video. The y-axis has the title
“Sequence of subspans over time,” and the x-axis is titled “Position of Sub-span on the Video Timeline (s)”. The graph on the left side is
titled “Participant: 5 | Video Name: Theatre 3.” The sequence of horizontal lines indicating selected spans reveals a pattern in which the
user attempts searches and progressively narrows the width of the span selection iteratively, to refine the search. Figure 4 (b) shows a
screenshot of the prototype used in study 2. The figure on the left has the following text written in the text field “baby creates tears of joy. I
want you to look at me, you don’t understand my frustration”. There are two search results fully visible. There is a video of signs for
“What’s-up” and “Preference” with English gloss labels at the bottom of each video. The eye gaze path of the participant moves from the
span-selection tool, to one of the signer's hands in the main video, to the other hand of the signer in the main video,, and back to the
span-selection tool. This suggests the participant glanced to the main video window while adjusting the span selection.

Figure 5: Figure 5 (a) shows plots of span durations selected by participant 7 for a theatre video. The Y-axis has the title “Sequence of
subspans over time,” and the x-axis is titled “Position of Sub-span on the Video Timeline (s). The pattern of span selections suggests that
the Participant first selected and viewed the entire video, and then they later used the span selection to check a few earlier portions of
the video to conduct a search of some signs.

Figure 6: The image shows a screenshot of the prototype. Two search results are partially visible at the top right. The signs are “Long-
ago” and “Weekend”. Underneath that there are two signs for “Reflect” and “Again”. The translation text box contains a partial string of
text: “light facing me. I open my eyes and oh! Its a full moon shining into my bedroom. I sit up and look at it”. The participant’s eye gaze
moves between the hands of the signer in the main video and the search-results item for REFLECT.

Figure 7: 6 Boxplots showing the distribution of the width of spans selected immediately prior to a search request (top three plots) and

https://new.precisionconference.com/assets22a/author/subs/8747 Page 2 of 3
#8747 "Support in the Moment: Benefits and use of ..." 4/14/22, 5:11 PM

when a search is not made (bottom three plots). The x-axis is titled “Duration of Span (seconds). There is also a legend for the three
genres of videos. For spans selected immediately prior to a search request, the median widths were 2.505 for Conversation, 1.508 for
Education, and 3.175 for Theater. Theater was significantly higher than the other two cases. For spans selected not immediately prior to
a search request, the median widths were 9.878 for Conversation, 9.613 for Education, and 16.742 for Theater. Theater was significantly
higher than the other two cases.

Figure 8: The image on the left shows a screenshot of the baseline prototype. The image says Video 3 on the top left corner of the
screen. There is the “Next Video” button on the top right corner of the screen. There is a video player in the center, with black space on
either side. There is a video timeline and span selector along the bottom, here with the entire span selected. There is a triangular “play”
button on the bottom right corner of the screen next to a “Play selection” button.

Table 1 Summary: A table with 4 columns titled TLX Sub-Scale, Using dictionary-search prototype, Using baseline prototype, and
Significance Testing. Scaled raw averages for each of the 6 TLX sub-scales are provided for both dictionary-search (n=8) and baseline
prototype (n=6) conditions. The rightmost column displays p scores and U values from two-tailed Mann-Whitney U tests used for
statistical testing.

Video Figure

The file (46.0 MB)

Supplementary Material (in a ZIP file)

The file (2.8 KB)

Description of Supplementary Files

There are two supplementary files.

"Videos Used - Study 1.csv" contains video genres, titles, and URLs of four example videos that were shown to participants in the first
study.

"Videos Used - Study 2 and 3.csv" contains video name, genre, start timestamp, end timestamp, and duration of the 9 videos used in the
prototype for study 2 and study 3.

Anonymity Check

I verify that this submission is correctly anonymized.

Return to list of submissions

https://new.precisionconference.com/assets22a/author/subs/8747 Page 3 of 3

You might also like