Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand

Histogram of Oriented Gradients for Human


Detection in Video
Thattapon Surasak Ito Takahiro Cheng-hsuan Cheng
Institute of Communications Engineering Department of Computer Science Department of Electrical Engineering
National Tsing Hua University National Tsing Hua University National Tsing Hua University
Hsinchu, Taiwan Hsinchu, Taiwan Hsinchu, Taiwan
s105064860@m105.nthu.edu.tw s105064425@m105.nthu.edu.tw s106061597@m106.nthu.edu.tw

Chi-en Wang Pao-you Sheng


Department of Electrical Engineering Department of Electrical Engineering
National Tsing Hua University National Tsing Hua University
Hsinchu, Taiwan Hsinchu, Taiwan
s106061548@m106.nthu.edu.tw s104061549@m104.nthu.edu.tw

Abstract—Currently, Computer Vision (CV) is one of the most and finally developed to automobile technology. This system alerts the
popular research topics in the world. This is because it can driver when facing with dangerous or sensitive situations including
support the human daily life. Moreover, CV can also apply to the emergence of pedestrians on the street. For instance, the software
various theories and researches. Human Detection is one of the called  Mobileye , which installed in Volvo S60 car, is the human
most popular research topics in Computer Vision. In this paper, detection system which launched the first vision-based collision. It
we present a study of technique for human detection from video, works with full auto brake and pedestrian detection [8]. Therefore, it
which is the Histograms of Oriented Gradients or HOG by can decrease the accidents which caused by human error.
developing a piece of application to import and detect the human This project aims to distinguish the present of people in the videos
from the video. We use the HOG Algorithm to analyze every from the one that people is absent. This video will compose of various
frame from the video to find and count people. After analyzing combinations such as people, animals, and objects. This program
video from starting to the end, the program generate histogram will be developed to analyze frame-by-frame to select only people
to show the number of detected people versus playing period of among others objects in the video. The appearance of people during
the video. As a result, the expected results are obtained, including video played is recorded and generated to a histogram illustrating the
the detection of people in the video and the histogram generation relation between video playing period and number of detected people.
to show the appearance of human detected in the video file. Moreover, another video file is created showing the flames which
emphasize the detected people and the live number of people detected
Keywords—Human Detection, Histogram of Oriented Gradients via this program. We developed the experiment program using Python
and OpenCV. We also selected the histogram of oriented gradient
I. I NTRODUCTION (HOG) technique because HOG working procedures are mainly focus
Object detection is the process to find any specific objects that in differentiate the objects from background [7].
human can find in their daily life; for examples, whole-body of the
human, faces, houses, pets, motorcycles, or cars in selected images or II. R ELATED WORK
videos. For the Object detection algorithms, it uses extracted features Nguyen (2013) detected human by using contour based local
and learning algorithms to recognize examples of an object from motion features. This research composed of two parts. First is
a category. Currently, the object detection is frequently used in a generating template. This part uses the training data to generate
number of applications such as security, surveillance, and advanced the templates and whole human body template and find the key
driver assistance systems (ADAS). Object detection becomes a very points of each template by put the weight in each point. The second
important subject for computer vision area, pattern recognition and part is a testing part that used the sliding window to extract the
image analysis [1], [2], [3]. Moreover, Internet of Things (IoT) is candidate regions and applied Canny Edge Detection to find an edge
another popular research topic that can be applied to many research in candidate regions. With regards to the Canny Edge Detection, there
areas especially for computer vision. Therefore, the combination are four steps which includes Noise Reduction using Gaussian filter,
research of two main topics such as IoT and computer vision increase Find the intensity gradient of the images, Non-maximum suppression
dramatically over the past three years [4], [5], [6]. and Hysteresis Thresholding. Gaussian filter is applied to blur images
Human detection is the technology that people usually use to by using kernel or filter. The classification in this paper is binary
detect objects in images or videos. It is also a crucial step in the Support Vector Machine (SVM) [8].
video-based surveillance systems. The aim is to identify and monitor Zhou (2016) presented the four steps of spatio-temporal matching
humans for security purposes in the crowded environment such as (STM) to track human in different activities in the video by, firstly,
airports, bus terminals or train stations. One example is the video extract trajectories from short video segments by dense sampling
which captured by the CCTV. It has been processed to detect and feature points in the first frame and track them using optical flow.
track the movements of human, both whole and partial of the body. Then, select a set of 3D motion capture sequences by random and
Another issue is related to the road traffic accident which is one cluster motion capture segment into four temporal clusters. [9].
of the main concerns for people [4]. Since the late 90s, driving Dalal (2006) developed detector for standing and moving people
assistance systems has been studied intensively. With this, Human in the videos with possibly moving cameras and backgrounds. This
detection is included as a main part of the driving assistance system paper is combined motion descriptor and appearance descriptor. A

978-1-5386-5254-1/18/$31.00 ©2018 IEEE


172
Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on June 13,2024 at 20:32:37 UTC from IEEE Xplore. Restrictions apply.
2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand

technique that uses for getting motion descriptor is Histograms of


Flow (HOF) and use Histogram of Oriented Gradient (HOG) to
get appearance descriptors. The HOF consists of two steps. Firstly,
compute optical flow and flow magnitude of images at time and t+1.
Secondly is to compute the gradient magnitude of flow. HOG has
three steps, including preprocessing, HOG feature descriptor used
for human detection is calculated on a 64128 patch of an image.
Then the magnitude of gradient is calculated. Lastly is to calculate
Histogram of Gradients in 88 cells. The classification in this paper is
a Support Vector Machine (SVM). An SVM classifies data by finding Fig. 1. Computation of HOG feature (left) magnitude analysis, (middle)
directional analysis, and (right) histogram result. Adapted from [11] Copyright
the best hyperplane that separates all data points of one class from
2009, IEEE
those of the other class. The best hyperplane for an SVM means the
one with the largest margin between the two classes. Margin means
the maximal width of the slab parallel to the hyperplane that has no
interior data points. The support vectors are the data points that are is a component in u-direction , and also fv (u, v) is a component in
closest to the separating hyperplane; these points are on the boundary v-direction. After the program estimated the object position in the
of the slab [10]. image, then we train it to determine the object more precisely. This
Daisuke (2009) purposed the combination of using Ubiquitous second process is to take the direction of vector into account. This
Stereo Vision (USV) and HOG to recognize the wheelchair and is comparable to the fine adjustment.
test their work in both real world and laboratory environment. This
fv (u, v)
study will prior identify the disability who uses wheelchair to create Θ(u, v) = tan−1 (2)
the Intelligent Transport System (ITS). The researchers divided their fu (u, v)
methods into two main steps such as Detecting pedestrians and
obtaining masked disparity image by using USV, and Wheelchair According to fu (u, v) and fv (u, v), we can use arctangent to get
recognition using HOG feature and SVM. Research showed that the an angle as shown in equation 2. There will be a lot of vectors at
recognition performance is nearly 100% under the same photographic the same point, as you can see in Fig. 1 (middle). With the help of
environment (laboratory) for both pedestrians and wheelchair. For vector direction which described in detail in equation 2, the object
the results from the real environment, they took the experiment at is displayed as continuous variation of gradient depending on the
Yokohama Human & Techno-land 2007. As a result, the detection direction of vector. The result which shown as a gradient is difficult
with frame-unit evaluation rates of 77.57% for wheelchair user scenes to interpret, therefore we divided the image into a great number of
and 83.67% for pedestrian scenes [11]. pixels. After the pixels were defined, the overall result is revealed as
Guzman (2015) represents the research with the main purpose is to a histogram as shown in Fig. 1 (right). It demonstrates that the pixel
detect car in the outdoor environment. They used the HOG and SVM from background give much lower accumulating result than the pixel
to evaluate the results. Additionally, the researchers also generated from the object. In summary, by evaluate the result from histogram,
a new database for car recognition in multiple light, weather and this program is eventually could discriminate between object and
intrusion conditions. For the experiment, the system was test using background correctly and precisely.
a proprietary car database that was built using three cameras with
640x480 resolution. Research showed that a methodology for the car
B. Linear Support Vector Machine (SVM)
detecting and counting can get the expected classification results. As
a results, the HOG features is a good descriptor to identify a car in Linear Support Vector Machine (SVM) is the main supporting
the outdoor conditions [12]. algorithm that the program developer across the world believe that it
makes HOG works more efficiently. Especially for human detection
III. H ISTOGRAM OF O RIENTED G RADIENTS (HOG) AND A program which combines HOG and SVM together, is now widely
L INEAR S UPPORT V ECTOR M ACHINE (SVM) accepted at the moment [14]. The goal of using SVM is to design
In 2005, the HOG descriptor was firstly proposed by Dalal and a hyperplane that classify training vectors into different classes. A
Triggs [13]. It had almost perfect result in pedestrian detection hyperplane can be represented as a mathematical function 3
with complex background. The HOG descriptor has the key benefits
over other descriptors. Since it can operate on local cells, it is f (x) = (wT xi + b) (3)
invariant to geometric and photometric transformations, except for
object orientation. Recently, HOG descriptor with SVM classifier is Where wT is weight vector and xi is data vector from HOG. The
commonly used in various applications for robust object detection primary rule of determining a hyperplane is to make sure all the train
and recognition. vectors with the same label are grouped into the same class. Distance
between data and hyperplane can be represented as equation 4
A. Histogram of Oriented Gradients (HOG)    
 f (x)   f (x)  | f (x)|
HOG is used in this project to make the program familiar with    
the object that we want to detect. According to the explanation from  T w =  w = (4)
 w w   w2  w
[11], HOG is divided into 3 compulsory steps based on vector theory.
Initially, the interested image is analyzed roughly by separating object To maximize the margin of this data with respect to a linear
from background as shown in Fig. 1 (left). This process is looking boundary means to solve the following optimization problem as
for  magnitude difference in the image. Therefore, for now, we shown in equation 5
consider only the magnitude part of vector without the direction.
The magnitude in the image could be obtained by equation 1. | f (xi )|
 max min (5)
w,b 1≤i≤n w
m(u, v) = fu (u, v)2 + fv (u, v)2 (1)
After determining the hyperplane, we should get a plane which
According to equation 1, we get the magnitude m of a feature vector can segregate features into several different groups with maximum
at the point (u, v). m(u, v) consists of fu (u, v) and fv (u, v). fu (u, v) margin distance between them.

173
Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on June 13,2024 at 20:32:37 UTC from IEEE Xplore. Restrictions apply.
2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand

IV. A PPLICATION DEVELOPMENT


As we already mentioned in the introduction section, we created
the program for detecting human in the video by using Python with
OpenCV. This program consists of one main file and four support
feature files (object files). The Project.py file is the main program.
It was written using object oriented programming concept. The
Project.py file constructed only the principle command and leave the
detail explanations to the object files. When the program is running,
Project.py file can call other object files to fully perform the feature.
The mechanism of object programs using in this project are described
in the section below.
• Movie2bmp.py: The purpose of this program is to generate
bitmap file from each frame of the video. This process is
performed at a high speed using the OpenCV library.
• Detect.py: This object file is for human detection. Humans are
detected from bitmap file using HOG algorithm. In order to
speed up this process, we have changed the horizontal width to
400 pixels while the image ratio was maintained. In addition, we
modify the program to made the green flame around the detected
human indicating the successful result. Moreover, our program
also can display the total number of human in the image who
was detected via this program. In fact, this program requires
running numerous cycles repeatedly depending on the number
of bitmap files which have been generated by movie2bmp. This
action normally consumes a lot of time, however, this step could
be accelerated by executing python’s multiprocessing library 4
threads in parallel.
• Bmp2movie.py: This part of program, all bitmap files which
have been used for detection and generate the movie are tied
up together as a one file. Similar to Movie2bmp.py, the running
process of this program is enhanced using the OpenCV library.
• Peoplecount.py: The final object program is Peoplecount.py.
After executing Detect.py which was described above, Detect.py
produces a source data called  countlist and it also passes some
of the information to Peoplecount.py. This program eventually
presents a people-counting diagram. It is a graph that shows how
many people has shown up in the image during the detecting
period. Unfortunately, we decided to finished this work with
our aim to display the demonstrating graph showing number
of people in the video. For the outlook of this program, we
can combine this program with statistical program to gain the
most information out of our plotting. However, we have tried Fig. 2. Block diagram shows running process of program in each individual
to couple our program with simple movement average (SMA) step including the file used in each step.
to analyze our data sources.
As mentioned before, the application consists of one main file
(Project.py) and four support feature files (Movie2bmp.py, Detect.py, A. Results
Bmp2movie.py and Peoplecount.py). During the debugging methods, 1) Human Detection: Fig. 3 shows the result of detected people
we found that the program we developed has the whole execution in the video. Every detected people is square around with green
time about 5-10 minutes per cycle. To solve this problem, we applied frame. The red number located beside the green frame indicates the
four threads parallel execution features using Python Multiprocessing number of people in the frame at that moment. The green frame and
Library. It can reduce the processing time 45.96%. The flowchart of the red number are comparable. While the green frames illustrate the
our program can be seen in Fig. 2. place where people appear, the red number counts the total number
This project used the  INRIA Person Dataset containing 64 x 128 of people in each individual frame. According to the experiments,
pixel format cropped human’s images from various set of portraits. errors may come from the fact that the video file we used have not
There are many types of a human posture in this dataset, but most of got enough resolution. When the size of people in the frame is very
them are standing in many orientations against various backgrounds small, the program recognizes it as the background and cannot detect
including crowded. The dataset is available for any researches at as shown in Fig. 4. Moreover, our program, Detect.py, used HOG
http://pascal.inrialpes.fr/data/human/ (retrieved on: 27th Jan, 2018). algorithm and we have reduced the horizontal pixel to 400 pixels
for the faster analysis. Therefore, with these two combinations yield
the missing in people detection in the video. The ability of detection
V. R ESULTS could be increased by enlarge the resolution of video file as well as
We developed the application with Python and OpenCV to detect extend the detection pixel in HOG analysis. However, they will lead
the people in the video. The results can be showed into two parts to more time consuming for every running cycle.
according to their features in our application which are human
detection, and histogram generation.

174
Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on June 13,2024 at 20:32:37 UTC from IEEE Xplore. Restrictions apply.
2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand

human detection result is 81.23 percent with the standard deviation


(SD) of 10.95 percent as shown in Fig. 6 and Table I. The result as
expected are obtained, including the detection of people in video and
the histogram generation to show the appearance of human detected
in the video file. Anyway, there are some limitations of this study
as following. Firstly, some results are not good enough regarding
to the quality of video we used. For example, the lowest detection
accuracy which came from video file no.7 has the overlayer of people
at the same place. Moreover, the video was filmed with wobbling
which leaded to low resolution frame. Another is the lacking of
knowledge for Python with OpenCV software development. However,
Fig. 3. Successful result of human detection. Three people are framed, and the feasible solution for the second limitation will be expressed in
the number three shown in red. the future work section.

Average
Average
100
0 Average
Average
Standard
Standard
Standard
Standard deviation
deviation
deviationdeviation

Accuracy / percentage
Accuracy/Percentage
800

Accuracy/Percentage
600

400

200

00
0 1 2 3 4 5 6 7 8 9
Fig. 4. Unsucsessful result because people is too small. No frame shown,
and the number shown as zero. Video
Videofile
Video file no.
fileno.
no.

Fig. 6. Accuracy percentage of each video file. The average and SD are also
2) Histogram Generation: The histogram which generated included.
from our program is shown in Fig. 5. It is show the number of
detected people versus playing period of the video. Moreover, this
histogram also elucidates the efficiency of our detection program. TABLE I. NUMERIC VALUE OF ACCURACY PERCENTAGE
According to Fig. 5, the histogram has not got a Gaussian distribution OF 10 VIDEO FILES
profile which represent the probability density function of a normally Accuracy
distributed random variable. This occurrence infers that our HOG Video file no.
Percentage
program works improperly. This obstacle have been found earlier. To 0 83.00
solve this problem, it is suggested to use simple movement average 1 73.33
(SMA) to run in parallel with our program. 2 91.33
3 73.33
4 82.17
3.0 5 87.13
2.5 6 100.00
number of people

7 60.50
2.0 8 76.83
1.5 9 84.65

1.0
Min Max Average SD
60.50 100.00 81.23 10.95
0.5

0
0 250 500 750 1000 1250 1500 1750 VI. C ONCLUSION
period of time We had the application development to detect the people in video.
The application we created consists of three main features which are
Fig. 5. Histogram of the human detection via our program human detection, human counting and histogram generation. For the
human detection, we can detect the human and showed the detection
result by inserting the green frame as well as specify the number
B. Discussion of people for each video frame. According to our experiment, the
According to the experiment, we used 10 videos to do human average accuracy of the human detection result from 10 videos is
detection with our program. The accuracy of the detecting results 81.23 percent with the SD of 10.95 percent. Our histogram is also
is calculated by compare the number that we got from the program very helpful to generate the diagram to show how many people in
with the actual number of people in each frame. The accuracy of all each frame in each individual video playing period. However, the
10 video file are shown in accuracy percentage, percentage that the development of our program still incomplete, since it can be improved
detection is correct. The result shows that the highest accuracy is 100 the detection efficiency by using higher resolution detection as well
percent while the lowest is 60.50 percent. The average accuracy of the as fitting with statistical theory.

175
Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on June 13,2024 at 20:32:37 UTC from IEEE Xplore. Restrictions apply.
2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand

In terms of research recommendations for future study, we strongly


recommend that students and researchers who are interested in the
Human Detection using HOG techniques to carry on further study by
mainly focus on the quality assessment by using number of videos.
Moreover, we also recommend that the further research should have
high performance devices because the more the performance devices
is related to the more ability to compute and detect the human in
the high resolution videos and provide a very good detection results.
After all, this program should be developed in parallel with statistical
program for more convincible results.

R EFERENCES
[1] Y. Pang, K. Zhang, Y. Yuan, and K. Wang, “Distributed object
detection with linear svms,” IEEE Transactions on Cybernetics,
vol. 44, no. 11, pp. 2122–2133, 2014. [Online]. Available:
http://www.ncbi.nlm.nih.gov/pubmed/26020454
[2] H. Ren and Z. N. Li, “Object detection using edge histogram of oriented
gradient,” in 2014 IEEE International Conference on Image Processing
(ICIP), Oct 2014, pp. 4057–4061.
[3] C. Sun and P. Vallotton, “Fast linear feature detection using multiple di-
rectional non-maximum suppression,” in 18th International Conference
on Pattern Recognition (ICPR’06), vol. 2, 2006, pp. 288–291.
[4] M. R. T. Hossai, M. A. Shahjalal, and N. F. Nuri, “Design of an iot
based autonomous vehicle with the aid of computer vision,” in 2017
International Conference on Electrical, Computer and Communication
Engineering (ECCE), Feb 2017, pp. 752–756.
[5] Y. Lu, C. Lu, and C. K. Tang, “Online video object detection using
association lstm,” in 2017 IEEE International Conference on Computer
Vision (ICCV), Oct 2017, pp. 2363–2371.
[6] D. Sangeetha and P. Deepa, “Efficient scale invariant human detection
using histogram of oriented gradients for iot services,” in 2017 30th
International Conference on VLSI Design and 2017 16th International
Conference on Embedded Systems (VLSID), Jan 2017, pp. 61–66.
[7] A. Satpathy, X. Jiang, and H. L. Eng, “Human detection by quadratic
classification on subspace of extended histogram of gradients,” IEEE
Transactions on Image Processing, vol. 23, no. 1, pp. 287–297, 2014.
[Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/26020454
[8] D. T. Nguyen, W. Li, and P. O. Ogunbona, “Human detection
from images and videos: A survey,” Pattern Recognition, vol. 51,
no. Supplement C, pp. 148 – 175, 2016. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0031320315003179
[9] F. Zhou and F. D. l. Torre, “Spatio-temporal matching for human
pose estimation in video,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 38, no. 8, pp. 1492–1504, 2016. [Online].
Available: http://www.ncbi.nlm.nih.gov/pubmed/26020454
[10] N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented
Histograms of Flow and Appearance. Berlin, Heidelberg: Springer
Berlin Heidelberg, 2006, pp. 428–441.
[11] D. Hosotani, I. Yoda, and K. Sakaue, “Wheelchair recognition by
using stereo vision and histogram of oriented gradients (hog) in real
environments,” in 2009 Workshop on Applications of Computer Vision
(WACV), Dec 2009, pp. 1–6.
[12] S. Guzman, A. Gomez, G. Diez, and D. S. Fernandez, “Car detection
methodology in outdoor environment based on histogram of oriented
gradient (hog) and support vector machine (svm),” in 6th Latin-American
Conference on Networked and Electronic Media (LACNEM 2015), Sept
2015, pp. 1–4.
[13] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
detection,” in 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’05), vol. 1, June 2005, pp. 886–
893.
[14] H. Bristow and S. Lucey, “Why do linear SVMs trained on HOG features
perform so well?” ArXiv e-prints, Jun. 2014.

176
Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on June 13,2024 at 20:32:37 UTC from IEEE Xplore. Restrictions apply.

You might also like