20214 Tatemasonal Cotteace oa Computer and nfonatesEagacrig (ICE) Indoor CO2 Level-Based Occupancy Estimation at Low-Scale Occupant using Statistical Learning Method aol Rabman, Abdul Azis Abilla, Asep Apriana Devi Handaya Idrus Assagat Dept of Mechanical Enaineering Politenik Negeri Jakarta Depok, Indoncsia baolia aiman@ sin pia, abdul zis. @, asp apriana @mesin pj cid, devilhanday smesin pa acid, ‘dus assagafémesin po. acid Abstract— Most of the occupaney estimations based on Indoor COs levels are tested on a large-scale number of ‘ceupants such the order of tens or hundred. Logically because the pattern ofthe occupancy and COs level is about similar ata ‘broad range of occupants. In the present study, a small office ‘oom sith an occupancy scale of 0-6 people was tested. The Statistical Learning method is used to estimate the number of ‘occupants, including Decision Tree, Random Forest classifier, SVM, Logistic regression, A-Nearest Neighbor, and. Neural Network. A combination of training and testing data set fs applied to the methods and a comparison has been made in lorder to distinguish their accuracy. The result shows that the accuracy of slf-estimation and cross-estimation is ranged from 86100% and 86-94% respectively. It aso. found that the estimation aceuracy of selfand-cross validation does not Significantly increase with the increase of daa se combination. Keywords— occupancy estimation, carbon diaide, staistcal learning, low-seale occupant 1. IntRopucrioy ‘A realtime data ofthe number of occupants in a modem building is valuable information for building management to implement the building-efficient strategy. For example, they can shut down the HVAC and lighting system when the building is vacant, This strategy has proven fo be effective ia reducing energy consumption by 15% [1]. Various methods and devices have been introduced irom the literature on the subject of occupancy estimation, among them are using a video camera, RFID tags, PIR sensors, Wi-Fi, and indoor carbon dioxide (CO) levels. Existing research recognizes that ‘video camera has proven to have high accuracy in occupancy estimation reach to 93.32% [2], though this method may interfere the privacy of the occupant, Other than that, the accuracy of oceupaney estimation using RFID tags and W: Fi-based devices is very dependent on the occupants using the device or not. While another method like a PIR sensor has a ‘iffculty in detecting stationary occupants Carbon dioxide is a byproduct of human-exhaled which is tan odorless, colorless, non-toxic, and non-flammable gas at room temperature. Using CO) sensors as one of the environmental sensors could be a relevant method to quantify ‘the number of occupants since the magnitude of CO: levels is ‘most related to the number of occupants among other environmental sensors such as temperature, humidity and sound sensor (3]. However, the debate about using indoor (CO; levels has gained fresh prominence with many arguing such as: being sensitive to excessive openings, COs removal Ge S78 1.654-4288.621 531.00 LEE plants and enclosure cracks), sensor response, air mixing, and ‘ther CO> generation (eg. animals and combustion). Most of| the occupancy count based on indoor CO> levels are tested ina large coom with occupancy count teas to hundreds [4 [5]. At this stage, indoor CO» and occupancy profiles will superimpose each other resulting in a relatively small error in estimation. Generally, COy-based occupancy estimation refers. 10 physical methods and. statistical methods. The physical ‘method strongly depends on the model and its parameters [6], (7]. While the statistical method depends fon datasets for primary leaming tests [8]. Zuraimi [9] compared the accuracy of occupancy estimation between the physical method and the statistical method and. postulated that the accuracy of the statistical method is higher than the physical method. ‘The scope of the present study is focusing on the occupancy estimation at a small room with a law ‘occupancy scale. We use indoor CO» level and ventilation rate, and the method of Statistical Learning (SL) to estimate the mumber of occupants. The methods are limited to Decision Tree (DT), Random Forest Classifier (RFC), Support Vector Machines (SVM), Logistic Regression (LR), Nearest Neighbor (ENN), and Neural Network (NN). The aim of the present study is to compare the accuracy from the combination of several training and testing datasets. M, RELATED Work Methods based on SL to estimate the number of ‘occupants using indoor CO: level has been mumerously explored by numbers of researchers. Hailemariam [10] has attempted to evaluate the impact of the DT method using :multiple sensors of such as COs level, current, ight, motion, fand sound which the accuracy of occupancy detection was showa to have 80.02-98.449%, Hailemariam reveals that the DT method can improve occupancy detection systems based on motion sensors alone. Not only for the purpose of occupancy detection, DT method is also applicable for occupancy estimations [11}, 12} 13] ‘The method of RF is essentially a collection of Decision ‘Trees. Candanedo conducted a series of trials in ‘occupancy estimation which he mixed several methods of SSL and many combinations of sensors [14]. He reveals that the accuracy of the estimation using RF shows quantities of 78.76% and 20214 Tateasonal Cotteace oa Computes and nfonatesEagacring (ICE) 64.21% a first and second testing respectively. However, Kallio [13] measures that RF has lower accuracy than DT ia ‘occupancy estimation using a one-year data set Support Vector Regression isa well-known method that has advantages in optimizing pattern recognition systems with ood generalization capabilities [15]. This method his been ‘widely applied to predicting and estimating the number of occupants based on indoor CO» level [16] which fairly average ror estimation, Meanie, Chen (17] bas combined the Inhomogeneous Hidden Markov Model with Mlinomial Logistic Regression and shows tha the combination is more elective than Hidden Markov Model solely ML, METHOD A. Testhed “The testbed is a single zone ofan office oom with an acca and height of 4 mand 2.6 m respectively. Itis equipped wi Sensors, and ventilation systems such as return and supply ducting, aad centrifugal fans a5 illusvated in Fig. 1. The airflow passing through the ducting is basically measured sing a velocity meter (ht wire) and converted inte flow rate bused on the logarithmic Chebyshev method {18}, Theairlow rate enables 10 be adjusted vi a controllable fan whieh is embedded in both supply and retum ducting. The indoor CO; level measured using two CO sensors located in the ret ducting and inthe mile ofthe room, while the outdoor CO: concentration is measured via CO: sensor located in the supply ducting, The type of CO, sensor is a Kimo probe connected to the transmitter C310-HO series with the accuraey +3% of reading or #80 ppm and the velocity sensor 4s Kanomax 6501 series with ihe accuracy =3% of reading oF 4001 KPa, The ventilation scheme used for the present ‘measurement Was proportional to the number of occupants, ‘hich increase and decrease of the airflow rate according to ecupant entering and leaving the room. A laser beam is Installed inbetween te frame ofthe door o recor the ground truth of the occupants as well as input contol of ventilation rate. The interval of collecting data was set in one minute Which sores in the DAQ system, B. Statistical model Decision tree is one of the supervised machine learning algorithms, which can be used fr classification and regression problems, The tree structure is broken down into smaller pars called a node. The node may have more than two branches depending on the attribute test conditions and the selected attibute, The ateibute selection measures include enteopy (5) and information gain (S,A) as show in (1) and (2). Entropy(S) = Eiay~ Plog Pi, (1) ‘Where entropy isthe amount of iformation that acquired to describe the sample, isthe nimiber of partons Sand P, isthe proportion of category i elements over the total nmbet of recorded sample. Gain(S,A) = (8) ~ Eve vate ay pt (SDs @ Whore A isthe variable whichis texted, vs the possible ‘ale forthe variable and, isthe mmber of samples for ‘the value v. The |S,| and |S] is the cardinality of S, and S, respectively, Entopy (St) isthe entopy fora sample that has aval of ‘Theoretically, the RF method will improve the accuracy of the DT method since the RE method is a combination of each Fig. The Schematic Diagram of Sesorsin The Tes-chamber tree from a selected DT model. Determination of the classification by RF is taken based on the voting results ofthe formed tree Unlike RF, the goal of the SVM algorithm isto find the best hypemlane in N-limensional space (a space with N- number of features) that serves as a clear separator for the input data points. The SVM algorithm determines the best hyperplane that is able to separate a two classes which have ‘maximum margin, The optimization problem of SVM is use 3) and (4), min ww + C5, 6) subject to vw" (a) +b) 21-% H 20,012,084) Where x € RM is « taining data point, y © (1,~1} is its label, be 2 called a bias, is a mapping function, w is weights, and Cis the parameter to contol slack variable 6. In the case of classification, logistic regression works by calculating the class probability of a sample. The LR ‘equation model is formulated as (3) Bo + BXs tot Buu, (5) Where F(O,1) is binary label, X= (X,..¥q) are 2 explanatory variables selected based onthe "Akaike Infomation Criterion and f = (fl, is the estimated searessioncoeicient. Method of &-NN estimates the conitional distribution of, Y given X by calculating the closest distance and classifying the observation data ino the class with the highest probability The &-NN look forthe positive integer observations closest to the test observation xy and estimates the conditional probability that it befongs to class j using (6). PY = JI = 9) = Bsenp 10% =D 6 Where No isthe closest set of K-obesrvations and (9; = 4) isan label whose values ifthe observation value in My i not 0 and vice versa. After calculate these probabilities then the method assigns the greatest probability value of x tothe corresponding cass ‘The algorithm of NN were inspired by perceptron and neurons in the human brain. The perceptron accepts input in the form of numeric number then processes it to produce an ‘ouput. A perceptron consists of several components, namely input (x) Whereas = 1,2, ...m , weights (w) and bias (12), tr ined to: UNVERSITAS GADJAH MADA. Qounoaded on February 08,2022 at 1248-9 UTC fom EEE Xplore. Rectctons apy 20214 Tateasonal Cotteace oa Computes and nfonatesEagacring (ICE) Ln ALAA fan awl om al a te Fig. 2. Indoor CO2 Profs, Aci Number of Occupas, ad Vesti Rte Profile activation fnetion or non-linearity function (g), and output (J which formulated a (7), 9 = ow, +2 o ‘We use Python 3.6.9 to run the algorithm and calculate the accuracy of occupancy estimation from the experimental data xm) IV. RESULT AND DISCUSSION ‘The indoor CO: level as a result of the occupancy (¥) and ventilation rate (Q) pattern can be found in Fig. 2 which was faken for 6 days separately. The CO; level increase and decrease as occupant enter and leave the room with the average of occupancy 2-3 person at an occupied time, The ventilation rate profile almost fit with the occupancy profile Which more less affect the CO; profile. The estimation accuracy of each model is summarized in Table | which is denoted as NDT, NRFC, NSVM, NLR, NA-NN, and NNN. We divided the leaming data set into training (TR) and testing (TE), which combination has varied within increased and decreased by 10%, Similarly, results of the estimation accuracy are presented in TR for self-validation and TE for cross-validation. It is evident from the Table thatthe DT and RFC methods having perfectly accurate using self-validation followed by SVM and F-NN, NN, and LR respectively. The accuracy of cross-validation shows that RFC, SVM, and NN have the highest accuracy, followed by NN, DT, and LR. ‘What is more interesting about the data in this Table is thatthe estimation aecuracy of cross-validation at all methods does ‘not significantly increase withthe increase of TR dataset (or decrease in TE data set). Moreover, the accuracy of self validation was found almost stagnant at the increase and decrease of data set combination. In detail, the occupancy estimation pattern of six methods obtained from $0% of TR and TE data sot is shown in Fig. 3. "The observation period from Fig. 3 was captured for 16 hours which mostly during the occupied time. From the chart, the ‘occupancy estimation profiles (blue line) aze superimposed with the ground truth (NGT) (red line). Most of the methods are fit well when the room is unoccupied and they respond well when indoor CO» levels start to build up at early ‘ecupation. Even though estimations ate very difficult 1 catch all the NGT fluctuations (short change) but sill follow the ¢lobal trend of NOT, TABLE, THE ACCURACY OF SIX METHODS, = ‘Accuraes CO) more || oe or Mare ae Si MEAN ae a | cn [re [re [oe | re [re [ore | ae | re [oe | re |e | oe i [90 [190 [99 [10 [1 | os fos fs | es fo | on | | 8 ST 30 [70 | 100 | #9 | to | 92 | ov 9 [a7 | a | | 2 | ar | ar 40_[ [100 [-e9[ 100 | 92 93 | 9s | es | a7 | os | oo | oe so [so | 100 | 99 aw | | op fe ep | | | oo [0100 [a9 [oo [92 | 9s | se | a fos | 8 me 70 30 | 100 -|-99 [10 |os | ox [os [a | a | | os | | oo so [20 | 190 | 91 [1m] os | os [oe fa | a | | os | or] as 0 to [100 [ oa [os | os foe fe | ep | oe | | Avene 10 | 9 [1 | | [os |) fos | os | | (DT = occupancy stiaion wing Decision Ta °C = csupaey extaion wn Rann Fort css NSIM. NEA = oceans etinaton wing Support Vester Machines 7 ocupane eietion ing Nears Neighbor NR xXY ~sccupaney eximation ung Logie Reason cesupany estan using Neue Netw ws ‘Authored licensed us lime o- UNIVERSITAS GADJAMMADA. Oowrioaded an Fobrunry 08.2022 at 12:48:49 UTC from EEE Xplore. Resists apa. 20214 Tateasonal Cotteace oa Computes and nfonatesEagacring (ICE) =a Fig}. The Oceopncy Estimation sing Six Methods ofS V. Coxctusion ‘Six methods of Statistical Learning have been successfully implemented for occupancy estimations at room with low= scale occupants based on indoor CO» levels. Our results indicate that methods of RFC, SVM, and ‘-NN have the highest accuracy when Validated using testing data set (eross- validation, followed by NN, DT, and LR. We aso found from the combination of training and testing data set that the increase of taining or decrease of testing data set, does not significantly increase the accuracy of eross-estimation, The empirical findings inthis study provide a new understanding ‘of how the occupancy estimation profile can follow most of | the ground truth profile, but sil have difficulty following a short change of eal occupancy profile ACKNOWLEDGMENT ‘This work was funded by UP2M Politeknik Negeri Jakarta through the | PDUPT scheme (contract number: B.264/PL3.BPN.003/2021). [REFERENCES [H] CBSE, Building Contol System, Catered Isttue of Bulag Seviges Eanes, Londo, 209. [2]. P.weTien, We and Cala“ compte son asd occupancy ud equpment une detection apooch fo ecg Dulas emery span" Energies vol 1D 186, 2021 [Ro Zhang KP. Lam, ¥. Chiou, snl B. Dong, “afmationteoeic ‘rvircen tures selection fy oscopaney detection an ope ice Spaces” Bu Sim, VL 5, pp. 179-188 2002. [4] ZSun.S. Wang and a, "ns implementation an valvaton of| 5 COs adv demand contol ents sate na m= {noe fie bul” Bull and En. wa a0), pp. 124-153, 2011 (5) Lu, X Li, aod M. Vian, “A. novel and dye dems soled wefan satey foe CO, contol and energy ving in Spalding” En abd Bild, so 43, pp. 3499-2508 2011 (6) 11. Rabman and H, Hho, “Bayesian estimation afoccpancy dsbton ‘i muleoon ce buing based CO, cones Bud Sim, vol 1, pp. 575-583, 2018 (71 HL Rota snd HL a, “Realtine sentation contol sed ‘Baym estinaton of ceupney." Bu. Sim, vol. 14(), pp 87= 1497 202 [8] AG. Alon H. Ratan, JK Kin and H. Han, “Uncraines i eu ‘etwodk md! based on exton doide concent fr ccepancy ‘simation 1 of Meh So apd Teh, sol SI, 2573-2580, 2017 9] MS Zuaini, A Pasuzars AKA Chaturvedi JJ Yan, KW. Tama SSE. Lee, "Predicting ocupancy cous using pial end statisial (CO-tased modeling methologin "Bui and Ea ol 13,9817 528.2017 (10) E. Haiemaron, R. Golden, R, Amar, and A. Ken, “Reine cecupany detection sing Decision Tres wah mle seasor pes” ‘Spon Sm for Arch an Urban Desig, 2011, [1M] M. Houta, “Oecpsacy detection in rom using sense data” sei 2101 03661 [12] M. Amaya Arr, lis, S.Hanigopadyye, QD. Nod, nt ‘VR Basal, “Estning occupancy ia hecrogeneous, Sse cram," En, ad Bu vo 129, 9p. 46-58, 2016, [13] 1 Kallio, Tevone,P, Rasen, R. Malye, J. Kons, and Peto, “Foresasing of indoor CO. conceion ng mashine Teaming wth one Sear dat Bald. and Ex. 187, L074, 201 [1] 1M Canned an V, Felden, Ascunteosipancy destin of cofce om fon lig, tampeetu,buniiy and CO, assures ‘sag Sisal aming models” Ea and Bud, vl. 112 pp. 25. 2a16 [15] A.A. Abin and Sowamo, “Disgaoss of ditketes sing spp ‘storsmchine wt ral si ston Kero Int, Tet 0. 71S) pp. 8358, 206, (U6) ZH, RX. Goo, ad Z. Fan, “Oseupeny and inoor earnest ‘quay sersing for sat bulking" IEEE In. Inst and Mess. Te ont Pie, [UT] Z.Chen, QZ MLK Mod ad .C. So, “Ezvzosent seascrs ‘sed oseupancy estimation in badigs IMMER” vel 13 3). 2184-2198, 2017 [US] 180 396, Measure of id Sow in closed comduits—velity rea ‘eto sng ptt sate bes The It Ore. Stan. 2008 9 ‘Authored licensed us lime o- UNIVERSITAS GADJAMMADA. Oowrioaded an Fobrunry 08.2022 at 12:48:49 UTC from EEE Xplore. Resists apa.

