Download as pdf
Download as pdf
You are on page 1of 6
Gender and age recognition for video analytics solution Vladimir Khryashchev, Andrey Priorov and Alexander Ganin Image Processing Laboratory, P.G, Demidov Yaroslavl State University Yarodlavl, Russia, andeat@yande Abstract - An application for video data analysis based on computer vision and machine learning methods is presented. id age classifiers based on adaptive features, local ‘and support vector machines are proposed. More results for sudience ‘measurement videodata in which faces can be looks more or lest similar to RUS-FD private database. Ta this case we can reach total mean absolute error score less than 7. All the video processing stages are united into x real-time system of audience analysis. The system allows to extract al the possible information about people from the input video stream, to aggregate and analyze this information in order to measure different statistical parameters. The promising practical application of such Algorithms ‘can be human-computer interaction, surveillance ‘monitoring, video content analysis, targeted advertising, biometrics, and entertainment, Keywords—face detection, biometric features gender and age ‘estimation, support vector machines, adaptive features, local inary patterns, audience measurement sysiem. 1. IstRopuction Automatic video data analysis is a very challenging problem. In order to find a particular object in a video stream And automatically decide if it belongs to a particular class one should utilize @ number of different machine learning techniques and algorithms, solving object detection, tacking and recognition tasks (1-3]. A lot of different algorithms, using such popular techniques as principal component analysis, histogram analysis, artificial neural networks, Baycsian classification, adaptive boosting learning, different statistical ‘methods, and many others, have been proposed in the eld of computer vision and object recognition over recent years. ‘Some of these techniques are invariant to the type of analyzed ‘object, others, on the contrary, are utilizing aprioristic ‘knowledge about a particular object type such as its shape, typical color distribution, relative positioning of parts, ce. [4] In spite of the fact that in the real world thers is a huge number of various abjects, a considerable interest is being shown inthe development of algorithms of analysis of a particular object type — human faces. Interest in solving such problem as human gender and age estimation has been growing for many years. ‘The promising practical application of such algorithms ean be: angan@mail.ra human-computer interaction, surveillance monitoring, video content analysis, targeted advertising, biometties, and entertainment. Gender recognition, for example, can be used to collect and estimate demographic indicators [5-8]. Besides, it can be an important preprocessing step when solving the problem of person identification, as gender recognition allows twice to reduce the number of candidates for analysis (in case identical number of men and women in a database), and thus twice to accelerate the identification process Human age estimation is another problem in the field of ‘computer vision which is connected with face area analysis [9] AA typical solution that recognizes human gender and age by ‘mage/video automatically wsually inchudes a few basie blocks: face detection end cropping, image geometric and photometric ‘normalization, computing face descriptors, reducing optionally feature vector dimensionality, and applying a machine learning ‘method for estimation of gender and age category Both problems are quite sophisticated because of high variability in face appearance due to such factors as head rotation, emotions illumination conditions, face makeup, and ‘many others. All of these issues should be resolved, to some extend, to build an automatic solution which would be reasonable in practice. Some of the factors are taken into account during face normalization, while the others are resolved via enriching training dataset to model gendev/age function more accurately In onder to organize a completely automatic system, ‘machine learning algorithms are utilized in the combination with a face detection and face tracking algorithms, which selects candidates for further analysis [10-15]. In this paper we propose a system which extracts all the possible information about depicted people from the input video stream, aggregates and analyses it in order to measure different statistical parameters (fg. 1). The following metrics ar ealculated [8] ‘© Count — the number of potential viewers. ‘© Opportunity to See — the number viewers who were close to the video camera, ‘TOR SOR TST SOT TEE Video Data Face Tracking ‘Statisios Analysis ‘count Opportunity to See Dwell Time ‘Atonton Time Gender Age { Face Detection ‘Gender and Age Cassication| Fig. Abloek digram othe propose aplication or vido analysis ‘+ Dwell Time — the average time during which potential ‘viewers have been in the visibility area ‘© Attention Time — the average time when the viewer was watching the object of interest. '¢ Gender — viewer gender (man/woman). ‘+ Age — viewer age group (child/youth/adult/seniors) or ‘age estimation ‘The rest of the paper briefly describes main algorithmic techniques utilized for gender and age estimation from automatically detected face area H GENDER RECOGNITION A new gender recognition algorithm, proposed in this paper, is based on non-linear support vector machine (SVM) classifier with radial basis function (RBF) kemel, Detected fragments are preprocessed to align their luminance characteristics and to transform them to uniform scale, After that t extract information from image fragment and to move 4 lower dimension feature space local binary patterns (LBP) [16] operator is uilized. These simple local Features have been proved to show good resuls in application to face recognition tasks. Their calculation procedure is shown on fig. 2. (On the first step each pixel is compared with its neighbors. ‘The result of comparison is presented in binary scale. These digits from a given neighborhood (lets say 33 pixels) form a binary number which can be presented in decimal format, On the second stage image is divided into rectangular regions. A histogram of frequencies of emergence of numbers, acquired on the first step, i$ calculated for each region. The resulted feature vector is a concatenation of histograms from all regions ‘The obtained feature vector is transformed using a Gaussian radial basis Function kernel using Eq, 1 Sze] mwa, TP EEE CEE} wo A togramfor och ceion Fig. 2. LAP fur vector extraction procure wo Kemel function parameters C and are defined during training, The resulted feature veetor serves as an input to linear SVM classifier which decision rule is specified by Eq. 2: SUP) [Sue arr) @ ‘The se of support vectors {2} te sts of coetens {ag and te bis b are obtained atthe stage of classfer tring This how te proposed gender case based on pr features and. SVM was “consuvted (LBP-SVM Glassen Both gender recognition algorithm taining and testing requ big enough cole image datsse The mos commonly the age dats or the sks of human faces recognton the FERET database (17), but consis inset number of faces of different indvduas that's why we collected our own image database, gathered fom fren sources (Table 1 Fie. Faces on the images from the proposed database were detected automatically by AdaBoost face detection algorithm {18} Afler that false detections were manually removed, and the resulted dataset consisting 10 500 image fragments (5 250 for each class) was obiained. Examples of face detected area are shown on fig3. This dataset was split into three independent ‘mage sets: training, validation and testing. Training set was utilized for SVM classifier construction, Validation set was required in order to avoid the effect of overtraining during the selection of optimal parameters forthe kernel function, PARAMS ° "AESVM i Taraneter Va AFSVAT THFSVAT (iictoaTamater or ae (Tiscnamer of male fees $250 “Tae mar of eal eS 5350 egaiion ae Tae [Fake | tre [abe Minium image sla SOT Cssifcdss nee | o06 [oa [9039 (oar spc frst RG Ghssifisds“Yemsle™,% [91 [3a 57 (exper Foal “oil classification ne, e908 [92 | 923 [ 17 (Peeples ane Tom Ie yan ow Race Cessan Tighiag coon, background SI HEST | sccm xecssion s a) male fies ee Bl ale (1 Dele aces ig. 3. Detected agents fo the proposed image database For the representation of classification results we utilized the Receiver Operator Characteristic (ROC-curve). As there are ‘ovo classes, one of them is considered to be a positive decision and the other ~ a negative. ROC-curve is created by plotting the faction of tive positives aut of the positives (TPR = sue positive rate) vs, the fraction of false positives out of the negatives (FPR = false positive rate), at various discrimination threshold settings ‘The proposed classifier was compared to. AF-SVM algorithm deseribed in paper [8]. AF-SVM was chosen as a reference because it has both high recognition rate and low ‘operational complexity compared to state-of the-at classifiers Fig 4. ROC-carves fr LAP-SVM and AF-SVM slasiirs ‘Testing results of the proposed LBP-SVM classifier compared to AF-SVM performance are presented in Table It and on Fig. 4 Experimental results show that utilization of LBP features {for gender recognition improves overall performance by 1.5% allowing to aequire more than 92% aceuracy. IIL AGr Estimation ancorerit ‘The proposed age estimation algorithm realizes multiclass classification approach (Fig. 5) where for each age (from 1 0 1N) a binary classifier is constructed deciding whether @ person fon input image looks older than the given age or not. Input fragments are preprocessed to align their luminance characteristics and to ansform them to uniform scale. Preprocessing includes color space transformation ad scaling, both similar to that of gender recognition algorithm, Additionally image normalization was performed by histogram equalization procedure, Transformation to LBP feature space and SVM training procedure are used for binary classifier construction, To predict direct age binary classifier ‘outputs are statistically analyzed and the most probable age becomes the algorithm output. To test age estimation algorithms performance standard _metries were calculated: + Mean Absolute Enor (MAE) — mean absolute difference between estimated and real ages, ‘© Cumulative Score (CS)— the probability that estimated ‘age lies within an interval dx (rom real age fg | Se | wf {omen || FEST HC ) | -+[ereycatern }—o} teat }—| retge Fig. 5, LAP-SVM age estimation algorithm block diagram ‘There are conventional databases that are widely used in a field of age estimation from facial images. Each image in such data sets contains information about biological age of @ person on this image. The most commonly used database is MORPH [19]. It contains over $5 thousand facial images of more than 12000 different persons: men and women of various races, nationalities and ages. Another conventional database is FG-NET [20]. It contains 1002 images of 82 persons (about 12 images of different ages for each person). This database is not enough big for taining, despite of it FG-NET is widely used as testing database in literature Russian Faces Database (RUS-FD) that was selected from fee sources of information (social network Vkontakte with labeled age for person avatar picture) contains 150 images of real-life low resolution (60x60 pixels on each face). Russian- people faces for each age (ftom 6 to 60 years). The biological age of people on images was kuowa in advance, The accuracy ofthis information was verified by the expert group (21. To estimate the proposed algorithm in reali situation testing firsily performed on FG-NET database. Age on FG- NET database was marked manually by a group of experts to ‘compare subjective estimation with the algorithm performance. ‘The corresponding dependences for LBP-SVM algorithm simulation are presented on Fig. 6 and Fig. 7. ‘The proposed algorithm shows results comparable to the subjective evaluation in a range of ages from 20 to 35 years. ‘The average absolute error in this range is about 6 years old Accuracy of LBP-SVM algorithm decreases on senior ages because of MAE grows. In this range (45-60 years), the proposed algorithm yields an expert evaluation approximately 10-15 years in terms of average error. Cumulative score shows that around 40% of estimations hhave less than S years deviation from true age and 70% ~ less than 10 years deviation. Subjective evaluation eurve on Fig. 7 sive us the possible limit for future age estimation algorithm ‘improvement, Analysis of the error probebility density function shows that the proposed algorithm has close t symmetric error diseibution. Objective results are not inelined to overestimate the tre age, which is typical forthe evaluation of experts, ‘Total MAE scote of LBP-SVM algorithm (learning on real- life dataset) on RUS-FD database is 6.94, MORTH database — 1.29, FG-NET database ~ 7.47, Subjective estimation MAE is 42 indicating that the proposed algorithm still needs much ‘improvement to show results comparable to a human. The possible ways to improve the accuracy of age classifier are eatute set expansion (utilization of a combination of different feature transforms), cost-sensitive SVM. leaming procedure utilization, pre-processing and post-processing. steps efficiency improvement. Examples of face images where LBP-SVM algorithm have good and poor age estimation are shown on Fe 8 AA ARAN Fig, 6. MAE on FG-NET database for LBP-SVM algorithm Fig. 7. CS on FG-NET database for LBP-SVM algorithm Tie: 28 lk 2 @ Tre 2 Tine: 36 Te «2 Toe:27| Te 2A®@ (Poor ag esimation Fig. 8. Examples of age estimation using the propose algritum, (2) Good age estimation, (b) Poor age estimation "TABLEIL, A cowanisow oF rRovosé MEmion Wii ANOTIR AGE ESTIMATION APFROACIE ang ase aging date Pabiation Petar etacion aa ee Peteraacs amare nd cay aad Fag OT Fora Fava VA 1600, MAES Smale {cS Nae0% Tal ea [2] aa ed ical agro FNET MABE? Tinta 231 Component and ott bilpealy —] FO-NET FG-NET NORTH TPCOS ‘roped fre (OF) Mornin Marsae/aa's1 " CSO i802, 10036) 8.748 Ta 686 Tage AT Taisie appara pra FoNEr FONETMORTH. components (PCA) Morn Mation/38 86st 64 wean T Tlic and osT opolgy 2D spe, | FGNET FONET Prat color and rae! Prvate(NA, 8000) Mar6ov4? 5 558 66% Taal aT Tose FGNET MAE:a8/ FE S5.N033 Private YA (1600, 800) C5: are/ 79% MAS Maia Thre HOTS apa, aor, TBF FONET FO-NET PAUERT PAL NA,$30) MAES47/43.47 sate BERC (NA. 390), (eS. 794 7096596, ‘Gaoaad Wang DS] | HoIsic BOF, pata Tat ogee PIS) | PAL (50, 17) TPADFACES AGES (71,1026) MAE 1/81 Chas eT Tabstaanitverievan component | FGNET MAE Aa sale Tapa bp, alee OP aka PALA Tae Hs eta I] BF Tags of Groups Tages of Groups EWE Trwr ‘Age Group: 681 FGNET FoNET MAbsas San] TP, Gabor Age Coup 55 Yous da] ESP © Gauge 3L7 Alsat a 5) Onan TaN Tae aE Group 36% Tropes Ta Private RUS, RUS#D MORTE MORTH MAE 694/486 FGNET 5540%4 505 We summarize published methods and results for age estimation from different face databases in Table Ill, AS you ean see from this review the most popular features type are ~ biologically inspired features (BIF) and it's. modification, Gabor and local binary pattems and it's modification, Except MORTH and FG-NET databases in some papers algorithms also test on PCOS (Pinellas County Sheriff's Office), PAL, LFW+ (extended version of Labeled Faces in the Wild) and some private databases Top level age estimation algorithms ean reach total MAE score between 4 and S. CS in the same column reflects the percentage of correct age estimations within 5-year absolute errr. In some papers researchers prefer to calculate MAE and CS separately for male and female su databases. Our age estimation algorithm provides world-quality results for MORTH database, but focused on real-life audience ‘measurement application in which faces can be looks more or less similar to RUS-FD private database, In this ease we can reach total MAE score less than 7 IV. Conctwston ‘A modem efficient machine learning algorithm based on LBP features allows us to recognize viewer's gender for video analytics systems with more than 92% accuracy. For age estimation task experimental results on standard FG-NET, MORPH and our private RUS-FD face aging databases are presented. Human perception bility in age estimation is studied using crowdsourcing experiment which allows a comparison of the ability of machines and humans, The fobiained results show that our framework provides’ high accuracy for both gender and age estimation classification problems compared to the best known methods. The pipeline is also interesting for using in practical application as the same features are used for solving the both problems, hence LBP descriptors must be computed only once per each face. The gender and age estimation algorithms described in this paper has integrated in audience measurement system which can collect and process the videodata in realtime: REFERENCES. (U1. E.Abpayeia,irodctono Machine Leaning. The MIT Pest, 2010, [2]. C. Sammut, and G1 Webb, Encyclopedia of Mochine Leasing. Springer, 2011 [B] ZL Stn, aad KJ. Au, Hauubook of Face Recognition Spengs, [AR Suet, Comper Vision. Algoritns and Applications Speinge, 2010 [5]. Makinen snd RRasamo, “An experimen comparison of ender lasifeaton meted" Patero Recognition Lets, vol 29, No I, i ud H. Misunot, “Maleenale ideatifcaon From 86 very low resolution face images by neural two,” Pater Recopaiton Lats, a 29, No2, 1995 9. 38385, [7]. V. Khnasbeber, A. Pree, E. Stmaglt, mé M. Golubev, "Gender Recognition wa_Face Ad, Analysis" Proc. World Congress on engineering an Computer Sense esl, LISA, 2012 pp. 8589. [Si V- Khoter, A. Gain, M- Goluber and 1. Shmapt, “Audience sual sytem 6 the basis of fce detection, aching aad cassifetion fechngies” in Poseeding ofthe Interstional MulhiCanfrence of Engine sod Computer Sten, Hong Kong 2013, pp 46-50 [D]. ¥. Fu and TS. Hung, “Age synthesis and estinaton wa Feces: 9 Survey," IEE Trans op Pte Aral and Machine ineligenes 32,No 11,2010, pp. 1955-1996 [0] KX. Song, and T, Poggio, “Example Based Leaming for View Based ‘man Face Destione iEEE Tra. Patrn Analysis and Machine Inetgence, 9 20,1998, pp. 5951 (11) 4, Majd, and Re Liew, “Face Detection with Support Vector ‘Mochies anda Very Large Set of Liner Features,” IEEE ICME 2002, [12] D_ Roth, MA, Yang, nd N. Abuja “A SNoW-bse fice deco.” In “Advance in Neal Information Processing Systems 12, MIT Press, (Cambie, MA. 2000, pp. 835-85 (13) P. Jul, apd R. Marsh, A biracial neural network fr uma ace «deca Pattrs Recognition vo. 29,196, p. T1787, (14) HLA. Rowiey, 8. Baus, ad T. Kena, “Neural networ-baed fice

You might also like