Professional Documents
Culture Documents
Eyegaze Fixation and Attention Prediciton
Eyegaze Fixation and Attention Prediciton
Omran Kaddah
Table 3.
Datasets BDD-A DR(eye)VE
KL CC KL CC
Models Mean CI Mean CI Mean CI Mean CI
BDD-A [3] 1.24 (1.21, 1.28) 0.58 (0.56, 0.59) - - - -
BDD-A(HWS) [3] 1.24 (1.21, 1.27) 0.59 (0.57, 0.60) - - - -
Palazzi et all 2017 [19] 1.95 (1.87, 2.04) 0.50 (0.48, 0.52) 1.42 (0.35, 2.49) 0.55 (0.27, 0.83)
Palazzi et all 2018 [4] - - - - 1.40 - 0.56 -
KL := KL divergence
CC := Correlation coefficient
CI := Confidence Interval
example, whether it is good idea to use LSTM layer, that [5] S. Jha and C. Busso, “Estimation of gaze region using two dimensional
is making use of temporal information, in models that infers probabilistic maps constructed using convolutional neural networks,” in
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech
the eye gaze fixation. One can also observe the problem and Signal Processing (ICASSP). IEEE, 2019, pp. 3792–3796.
with regards to unified and standardized dataset for driver’s [6] N. Li and C. Busso, “Calibration free, user-independent gaze estimation
attention prediction models. As it has been seen earlier, it with tensor analysis,” Image and Vision Computing, vol. 74, pp. 10–20,
2018.
was not possible to give a fair comparison to the models
[7] K. A. Funes Mora, F. Monay, and J.-M. Odobez, “Eyediap: A database
without fine tuning on some specific dataset. More attention to for the development and evaluation of gaze estimation algorithms from
attention prediction models should be also given. For example, rgb and rgb-d cameras,” in Proceedings of the Symposium on Eye
model in [3] is using AlexNet feature extractor, though it Tracking Research and Applications. ACM, 2014, pp. 255–258.
[8] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “Mpiigaze: Real-world
is powerful, there are currently architecture such MobileNet dataset and deep appearance-based gaze estimation,” IEEE Transactions
v2 [24] that has 12 times less parameters, the same number on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 162–
operation for forward pass, and a better accuracy. Also, one 175, 2017.
might also use a pretrained upsampling layer from a semantic [9] F. W. Cornelissen, E. M. Peters, and J. Palmer, “The eyelink toolbox: eye
tracking with matlab and the psychophysics toolbox,” Behavior Research
segmentation model. This might make things easier for the Methods, Instruments, & Computers, vol. 34, no. 4, pp. 613–617, 2002.
coming layers, as the objects in the driving scene are already [10] N. Li and C. Busso, “Evaluating the robustness of an appearance-based
classified. gaze estimation method for multimodal interfaces,” in Proceedings of
the 15th ACM on International conference on multimodal interaction.
ACM, 2013, pp. 91–98.
R EFERENCES [11] Y. Sugano, Y. Matsushita, and Y. Sato, “Learning-by-synthesis for
appearance-based 3d gaze estimation,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2014, pp.
[1] S. G. Klauer, F. Guo, B. G. Simons-Morton, M. C. Ouimet, S. E. Lee,
1821–1828.
and T. A. Dingus, “Distracted driving and risk of road crashes among
novice and experienced drivers,” New England journal of medicine, vol. [12] T. Schneider, B. Schauerte, and R. Stiefelhagen, “Manifold alignment
370, no. 1, pp. 54–59, 2014. for person independent appearance-based gaze estimation,” in 2014 22nd
[2] M. A. Regan, C. Hallett, and C. P. Gordon, “Driver distraction and driver International Conference on Pattern Recognition. IEEE, 2014, pp.
inattention: Definition, relationship and taxonomy,” Accident Analysis & 1167–1172.
Prevention, vol. 43, no. 5, pp. 1771–1781, 2011. [13] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “Appearance-based
[3] Y. Xia, D. Zhang, J. Kim, K. Nakayama, K. Zipser, and D. Whitney, gaze estimation in the wild,” in Proceedings of the IEEE conference
“Predicting driver attention in critical situations,” in Asian Conference on computer vision and pattern recognition, 2015, pp. 4511–4520.
on Computer Vision. Springer, 2018, pp. 658–674. [14] L. Simon, J.-P. Tarel, and R. Brémond, “Alerting the drivers about
[4] A. Palazzi, D. Abati, F. Solera, R. Cucchiara et al., “Predicting the road signs with poor visual saliency,” in 2009 IEEE Intelligent Vehicles
driver’s focus of attention: the dr (eye) ve project,” IEEE transactions Symposium. IEEE, 2009, pp. 48–53.
on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1720– [15] G. Underwood, K. Humphrey, and E. Van Loon, “Decisions about
1733, 2018. objects in real-world scenes are influenced by visual saliency before
Fig. 3. CNN architecture of the model proposed by Palazzi et all[4]
and during their inspection,” Vision research, vol. 51, no. 18, pp. 2031–
2038, 2011.
[16] L. Fridman, P. Langhans, J. Lee, and B. Reimer, “Driver gaze region
estimation without use of eye movement,” IEEE Intelligent Systems,
vol. 31, no. 3, pp. 49–56, 2016.
[17] S. Alletto, A. Palazzi, F. Solera, S. Calderara, and R. Cucchiara, “Dr
(eye) ve: a dataset for attention-based tasks with applications to au-
tonomous and assisted driving,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition Workshops, 2016, pp. 54–
60.
[18] P. Cavanagh and G. A. Alvarez, “Tracking multiple targets with multi-
focal attention,” Trends in cognitive sciences, vol. 9, no. 7, pp. 349–354,
2005.
[19] A. Palazzi, F. Solera, S. Calderara, S. Alletto, and R. Cucchiara,
“Learning where to attend like a human driver,” in 2017 IEEE Intelligent
Vehicles Symposium (IV). IEEE, 2017, pp. 920–925.
[20] S. Mannan, K. Ruddock, and D. Wooding, “Fixation sequences made
during visual examination of briefly presented 2d images.” Spatial vision,
1997.
[21] R. Groner, F. Walder, and M. Groner, “Looking at faces: Local and
global aspects of scanpaths,” in Advances in Psychology. Elsevier,
1984, vol. 22, pp. 523–533.
[22] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning
spatiotemporal features with 3d convolutional networks,” in Proceedings
of the IEEE international conference on computer vision, 2015, pp.
4489–4497.
[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.
[24] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
“Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
2018, pp. 4510–4520.