C13 Research Paper 3-2

HUMAN DETECTION AND COUNTING
Gopal Krishna Pasumarty

Akarsh Reddy Avula Dileep Marupudi
Dept. of information technology
Dept. of informstion technology Dept. of information technology
GRIET, (JNTUH)
GRIET, (JNTUH) GRIET, (JNTUH)
Hyderabad, INDIA
Hyderabad, INDIA Hyderabad, INDIA
email: dean@griet.ac.in
email: aareddy.2002@gmail.com email: dileepmarupudi28@gmail.com
Akash Mysa
Lalith Sagar Pudi
GRIET, (JNTUH)
GRIET, (JNTUH)
Hyderabad, INDIA
Hyderabad, INDIA
email: mysaakash125@gmail.com
email: lalithsagar4515@gmail.com
integral histogram data structure as a crucial tool for its
implementation. Their technique resolves a number of issues
Abstract—It is more important to detect human beings that conventional histogram-based algorithms are unable to
accurately for diverse application areas which include human manage such as (i) Handling a change in the pose or partial
characterization, congestion analysis, pedestrian detection, and occlusions. (ii) The spatial distribution of the pixel intensities
person identification tracking. The project’s major motivation can be taken into account this information is lost in
and scope are to detect and track down the moving people
traditional algorithms. (iii) Computational cost is made
among the cluster of moving objects and count the number of
efficient for both the large and small targets.
people detected in the scenario. Techniques that are used to
perform this include optical flow, background modeling, A system was developed for robust people counting
subtraction, and filtering techniques. Once a moving object is which consists of four components an autocalibration
detected, the classification can be done based on the module, a standardized low-level foreground estimation
orientation, texture, shape, and motion of the object or person. algorithm, crowd segmentation, and a template-based tracker
For the classification, Haar feature classifiers are used and by X. Liu et al., [3] which results in an estimate that the
features that include the oriented gradients are also used.
usage of site geometry made a significant benefit in
Thus, the humans are detected and the count of humans is also
constraining people detection problem and to get relevant
recorded by using the created model.
scene information. They made a system that has the ability to
Keywords—Haar classification, Histograms of oriented segment out individuals from larger groups of people and
gradient features. monitor them over time. By effectively utilizing spatial
context, this model-based methodology also enables the
system to recognize specific occurrences or events
I. INTRODUCTION automatically.
Recognizing humans in a video scenario of
reconnaissance framework which is pulling in more [4] Boris Babenko et al., in this study, introduce the
attraction, and consideration and got an eye catchy attention MILTrack tracking system, which employs a cutting-edge
in recent years due to its numerous runs of applications. The online multiple-instance learning algorithm. Even though it
discernment of humans in video surveillance and an image is is uncertain which image patch most accurately portrays the
becoming crucial for abnormal event detection, the person object of interest, the MIL framework enables us to update
checking and identifying the target person in a thick swarm, the appearance model with a collection of image patches.
pedestrian discovery and classification by detecting, human They stated that their algorithm can be implemented easily
gait analysis, identification of persons, and then tracking and can function at real-time speeds.
down, gender or sex classification, and old-age people drop [5] David S. Bolme et al. introduce the average of
or fall detection. synthetic exact filters (ASEF), a technique for building
Computer Vision is the field of study that involves the correlation filters for object detection. As demonstrated by
methods which are required for analyzing, acquiring, eye identification experiments on the FERET database as a
processing, and understanding real-world scenario images task, ASEF filters outperform two other generally proposed
and generally higher dimensional data in the order in eye detection algorithms as well as previously proposed
obtaining numerical or symbolical information that is which synthetic correlation filters.
are in the form of decisions. They mentioned that ASEF filters differentiate from
The disentangling of symbolical information from image previous filters in 2 ways, (i)ASEF generates an exact filter
data utilizing models built with the aid of geometry, physics, for each training image that recreates the entire desired
statistics, and learning theory can be considered as the correlation surface. Since the background can have
classification and comprehension of images. However, by interference patterns, this improves the resolution of the
employing the latest computer vision research approaches, filter. (ii) The exact filters for all of the training photos are
accuracy can be enhanced, as well as customization, averaged to get the final filter. By highlighting traits that are
scalability, and ease of integration. similar to multiple photos, this averaging prevent over-
fitting. They proved that this filter has also performed well at
face detection and in images using iris sensors locating
pupils.
II. LITERATURE SURVEY [6] The work was extended to present an incredibly
Allan D. Jepson et al., [1] in this paper, proposed a straightforward human recognition technique based on
framework for developing reliable, flexible appearance filtering and edge-magnitude image correlation. The ASEF-
models that will be applied to motion-based monitoring of based detector obtains a 94.5% detection rate with fewer than
intricate natural objects. An implementation of this strategy one false detection per frame for sparse crowds and can
is created for an appearance model based on the steerable analyze images at a pace of over 25 frames per second. The
pyramid's filter responses. This model is applied to a motion- training is quicker consuming 12 seconds on 32 manually
based tracking algorithm to give resilience against image annotated images to train the detector. Here the dataset used
outliers, such as those brought on by occlusions, while for evaluation is PETS 2009 whose results are in comparison
adjusting to real-world changes in appearance, such as those to the cascade classifier of OpenCV and a person detector
brought on by changes in 3D posture or facial expressions. based on the state-of-the-art deformable parts.
[2] Amit Adam et al. presented a novel algorithm used [7] In a low-resolution image featuring complex
for tracking down an object in a visual reconnaissance which scenarios, this study proposes to develop an effective
is named "Frag Track". The approach makes use of the approach for estimating the number of individuals and
locating each individual. It's a threefold contribution by Ya- is a false positive. They have proved that their system
Li Hou et al., involving (i) In a complex scenario containing performance in real-time is satisfactory.
only a small number of moving people estimation is done by
[11] This paper deals with the need for crowd
performing postprocessing steps on results obtained through
management, targeting the covid-19 situation. Advanced
background subtraction. (ii) For locating individuals in a
deep learning computer vision algorithms are specifically
low-resolution scene an EM (Expectation Maximization)
chosen by YOLOv3 for human object detection and
method was developed which makes a representation of each
classification, DeepSORT tracking algorithm to track each
person in a scenario with a new cluster model. They have
human object observed, and intrusion line judgement for
also stated that a very accurate foreground contour is not
counting. This in addition converts the pre-trained YOLOv3
required for their method. (iii) A priori was used as people
into TensorFlow for better and quicker computation in real-
count for locating individuals making features as a basis.
time using GPU (Graphical Processing Unit) instead of CPU
They performed these developed methods on a 4-hour long
(Central Processing Unit). With testing films taken from the
scale video for validation where people ranging in the video
internet to replicate the entrance to a mall, the experimental
lie between 36 to 222 obtaining the best result of estimating
findings have demonstrated that this implementation
human count has a 10% average error in comparison to 51
combination is 91.07% accurate and real-time capable.
test cases. Here they have used a neural network for the
estimation of people. In a neural network when both pixels
of foreground and closed foreground are learned the best III. TECHNIQUES USED
estimation results are obtained. Background Segmentation
The necessity for a precise foreground contour has been Segmenting background or subtracting noise in the
alleviated by clustering the Kanade-Lucas-Tomasi (KLT) background of an image is a routine method in detecting an
feature points in a foreground mask. They have shown that object as the forth ground, by segmenting the object taken
their model is more accurate than the Gaussian Model. input through the surveillance camera scene. The camera
can be stationarily attached to a location, purely in a
[8] A novel and robust model named "The Omega
translatory state, or remotely operated. Background
Model" was developed for people identification and counting
Segmentation attempts to detect objects that move pixel-by-
that may be present at the incident by Subra Mukherjee et al.
The most distinguishing aspect of their work is the use of pixel or block-by-block from the change observed between
four descriptors to identify four significant and invariant the frame currently at and the frame that is taken as
characteristics of the human head-shoulder area. In detail, the reference. Reference frames are in common addressed as
descriptor parameters are studied and a conclusion was made "background images," "background models", either as
by them that no individual descriptor can detect humans thus "environmental models." A modelled background which is
they employed a weight-based decision system for better good must be able to get adapted with the dynamic scene
performance. Their approach's effectiveness was validated changes. You can do this by updating the background
with the experiments performed on several images. information on a regular basis, but you can also do it
without updating the background information.
[9] A system was built using real-life still images which
used the Viola-Jones detection algorithm as its major part.
This is implemented using MATLAB release 2012 P.O. Motion Estimation
Otasowie et al. This was performed only on a set of 35 Motion estimation is also referred as “Optical Flow”. This is
different images instead of video to reduce processing time a vector-based method for estimating video motion by
under a few conditions like Image Size, Profilers test, and matching object locations across the frames of the images.
algorithmic test. This system has an accuracy of 84% under Optical flow describes the smooth and steady mobility of
poor visual conditions and maintained a mark above 90% points or 22 features between frames. To recognize moving
under optimal situations. areas in an image stream, the motion segmentation that are
based on the optical flow employs the features of the flow
[10] In this study, Xinjian Zhang et al. proposed a fast vector of moving objects over time-to time analysis. The
and novel approach for crowded scenarios of surveillance to
main advantage of using motion estimation often referred as
estimate people's count. They have proved that their
optical flow is that the nature of it having robust against
approach with changes in the background and illumination is
robust and is able to count in real-time. In the process of many multiplexed or multiplicated simultaneous
training a muti-scaled head-shoulder detector, the cascade of surveillance cameras and objects that are in motion, the
the boosted classifier and combined rectangular features are uniqueness of making it for mass spectrometry and high-
used. For identifying humans accurately in every frame density motion conditions. You can use the optical flow-
detector does its best. In order to track down and remove based method to detect objects that move independently,
duplicates in further frames human tracking is used. The even in the case of the movement of camera like a camera
dataset they have used for training is a sample of 985 attached to a drone or a car. Apart from the noises of the
examples that are positive which are derived through 423 images, color, and sensitivity to randomly arranged lighting
images and 336 are well-known INRIA set images. They that is which are non-uniformly distributed,in the number
have tested it on subway station rea video and achieved most of the flow calculation related methods have many
better results when the frame rate is 10fps and the camera significant requirements that are computational and are
orientation of 30 degrees. They have followed the PASCAL sensitive to motion interruptions. The implementations
VOC protocol for evaluation on cropped images. It is which are real-time scenario of optical flow often require
considered correct(true positive) if the detected bounding dedicated hardware components due to the nature of
box overlaps more than 50% with a ground truth box, else it
complexity of the algorithms also the need for reasonably
high frame rates for accurate measurements.
Spatial-Temporal Filter Motion-based Model

For motion detection based on spatiotemporal analytical The premise behind this classification method is that item
approach, the action of event or motion is characterized or motion features and patterns are distinct enough to segregate
classified across the 3D spatiotemporal data 23 volume them between objects and can be distinguished. To
across the people or persons who are in movement in the distinguish humans from other moving things, techniques
sequence of images. These strategies or methodologies based on the movement will directly utilize the periodic
generally take the motion as a lump of whole into account characteristic of those images that are collected. By building
and then characterizes its spatiotemporal distribution. We a vector picture template consisting of two of the temporal
have processed the sequence of images or video sequence 25 projection operators: motion-energy image that is binary
using one of the spatial features, the spatial Gauss and a in nature and motion history image, Bobick and Daviss
Gaussian derivative along the axis of time on the plane or a established the view-based technique for the recognition of
graph to be plotted. Due to the differential operations along human movements.
the axis of time, the filter then shows a high response in the
operating range. We then used these responses in the A method based on texture
generation of the thresholds, get a motion related mask that The most popular texture-based method used for quantifying
is binary in nature, and then aggregate them into a spatial the patterns of intensity in the pixel’s immediate vicinity or
histogram bin. Such features compactly encode the area of a pixel is LBP which is the acronym for Local
movement or motion and information that is spatial Binary Pattern. The local binary pattern with a concept of
corresponding to them, which is the most useful work for multiblocks (MB-LBP) was introduced by Zhang et al for
distant and also the midfield videos of the surveillance encoding the intensities of rectangular sections using LBP.
cameras. The approaches are mostly based on a simple The feature descriptor HOG developed a texture based
operation of convolution and that are quick and also easier technique that employs higher-numbered dimensional
to implement. These are amongst the few very useful features that are based on edges, followed by SVM, to detect
approaches in low resolution or stubby-quality video the regions of the human body. This technique counts the
scenarios where it might be very difficult for the extraction occurrences of gradient orientation in localized sections of a
of other related features such as motion estimation often picture using overlapping local contrast normalization in the
referred to optical flow and silhouette. Spatiotemporal strategy of improving accuracy.
movement-based methodologies can better be capable of
capturing both the spatial and temporal related information IV. METHODOLOGY
of gait. The advantages are low complexity and easy The system that is proposed begins with the modules
implementation. However, it is more susceptible to noise required for detecting and analyzing the input provided,
and travel time fluctuations. where the modules that are involved has the pre-trained
models inside them. The libraries that are needed to build
Shape-based Model the project are OpenCV, Imutils, Numpy, Argparse. A
Shape-based methods start by describing the shape of the descriptor is used here in order to detect the objects, named
regions that are in movement like collection of points, “HOG (Histogram of Oriented Gradients) Descriptor”.
boxes, and a few blobs. Further, it is then often regarded as
a standardized-pattern recognition problem. Wang et al Histogram of Oriented Gradient Descriptor
looked at the way that how formations and deformations of In the process of detecting objects, it is used a descriptor
the 24 silhouettes of humans (or forms) during the motion that represents features of the input in computer vision and
that is articulated in nature, which could be employed as image processing. The implementation to combine HOG
features discriminating the nature to implicitly capture the Descriptor algorithm with SVM (Support Vector Machine)
dynamics of the motion, and they used the transform is preprocessed by OpenCV in a most efficient manner.
wavelet that is discrete in nature that is discrete-wavelet
transform and DFT to do so. Haar Cascade Classifier
For better working of the system in the real-time
environment and for faster working, Haar Cascades are used
as classifiers that are pre-trained and contained in repository
of the OpenCV library. These are popularly used for Human
Face detection and eye detection. We use it in our model for
better results and quicker working. This detect the human
faces and track them down and remove the duplicates in
corresponding frames. The implementation of this Haar
cascade classifier involve downloading the pre-trained XML
file in the repository of OpenCV and load using
cv2.CasscadeClassifier function.
For the human detection the full body cascade is used and in
order to track down a person in corresponding frames the
face detect cascade is used.
Workflow and Methods

A default people detector function is used to call the pre-
trained model for Human Detection of OpenCV and then we
will feed SVM with it.
A moving picture is created by combining a series of photos
into a video. We refer to these pictures as frames.
1. Libraries Importing
2. In order to create a Model for detecting humans, VI. CONCLUSION
for this we use HOGDescriptor with SVM already
implemented in OpenCV with haar cascade. In this project, we are detecting humans and counting the
3. Detect method, which takes a frame to detect number of persons in the image, video, or those who are on
person in it. Makes a box around person and show the camera. It is one of the most successful topics that are
the frame and return it. Detect Multi Scale method majorly related to vision research because of its huge run of
is used to detect objects of different sizes in an numerous applications. The way it is performed is in two
image. steps: object detection and classification. Object
4. Human Detector method takes the image and video classification is done with various techniques such as
as arguments they are captured using image techniques that are based on motion, shape, and texture.
utilities with OpenCV. With the help of algorithms, major applications of human
5. ArgParse is used to for parsing and returns as a detection and counting in a picture or video or live camera
dictionary argument through prompt to script. are reviewed.
VII. FUTURE ENHANCEMENTS
It is a significant step forward in the development of
V. RESULTS AND ANALYSIS artificial intelligence systems that can correctly classify and
We apply different methods to make this detection model recognize items, which is useful in robotics and cybernetics.
work. The input is provided by the user in the form of VIII. REFERENCES
arguments to the model if the arguments are images and
videos the model detects and gives the count of the people [1] A. D. Jepson, D. J. Fleet and T. F. El-Maraghi, "Robust
in the scene, if no arguments are passed it checks for an in- online appearance models for visual tracking," in IEEE
built webcam or a camera installed on the computer or takes Transactions on Pattern Analysis and Machine Intelligence,
the video from the camera. The output of the model is seen vol. 25, no. 10, pp. 1296-1311, Oct. 2003, doi:
on the separate windows for images, videos and the input 10.1109/TPAMI.2003.1233903.
accessed from the camera.
[2] A. Adam, E. Rivlin and I. Shimshoni, "Robust
Fragments-based Tracking using the Integral Histogram,"
2006 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR'06), New York, NY,
USA, 2006, pp. 798-805, doi: 10.1109/CVPR.2006.256.
[3] X. Liu, P. H. Tu, J. Rittscher, A. Perera and N.

Krahnstoever, "Detecting and counting people in
surveillance applications," IEEE Conference on Advanced
Video and Signal Based Surveillance, 2005., Como, Italy,
2005, pp. 306-311, doi: 10.1109/AVSS.2005.1577286.
[4] B. Babenko, M. -H. Yang and S. Belongie, "Visual

tracking with online Multiple Instance Learning," 2009
IEEE Conference on Computer Vision and Pattern
Recognition, Miami, FL, USA, 2009, pp. 983-990, doi:
10.1109/CVPR.2009.5206737.
[5] D. S. Bolme, B. A. Draper and J. R. Beveridge, surveillance system." arXiv preprint arXiv:1303.0633
"Average of Synthetic Exact Filters," 2009 IEEE (2013).
Conference on Computer Vision and Pattern Recognition,
Miami, FL, USA, 2009, pp. 2105-2112, doi: [9] Otasowie, P. O., & Edeoghon, I. A. (2014). Design and
10.1109/CVPR.2009.5206701. Implementation of a Human Detector and Counting System
using MATLAB.
[6] D. S. Bolme, Y. M. Lui, B. A. Draper and J. R.
Beveridge, "Simple real-time human detection using a [10] Zhang, X., Zhang, L. (2014). Real Time Crowd
single correlation filter," 2009 Twelfth IEEE International Counting with Human Detection and Human Tracking. In:
Workshop on Performance Evaluation of Tracking and Loo, C.K., Yap, K.S., Wong, K.W., Beng Jin, A.T., Huang,
Surveillance, Snowbird, UT, USA, 2009, pp. 1-8, doi: K. (eds) Neural Information Processing. ICONIP 2014.
10.1109/PETS-WINTER.2009.5399555. Lecture Notes in Computer Science, vol 8836. Springer,
Cham. https://doi.org/10.1007/978-3-319-12643-2_1
[7] Y. -L. Hou and G. K. H. Pang, "People Counting and
Human Detection in a Challenging Situation," in IEEE [11] H. Mokayed, T. Z. Quan, L. Alkhaled, and V.
Transactions on Systems, Man, and Cybernetics - Part A: Sivakumar, “Real-Time Human Detection and Counting
Systems and Humans, vol. 41, no. 1, pp. 24-33, Jan. 2011, System Using Deep Learning Computer Vision
doi: 10.1109/TSMCA.2010.2064299. Techniques”, AIA, Oct. 2022.
[8] Mukherjee, Subra, and Karen Das. "Omega model for

human detection and counting for application in smart

C13 Research Paper 3-2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

C13 Research Paper 3-2

Uploaded by

Copyright:

Available Formats

HUMAN DETECTION AND COUNTING

Gopal Krishna Pasumarty

Spatial-Temporal Filter Motion-based Model

Workflow and Methods

[3] X. Liu, P. H. Tu, J. Rittscher, A. Perera and N.

[4] B. Babenko, M. -H. Yang and S. Belongie, "Visual

[8] Mukherjee, Subra, and Karen Das. "Omega model for

You might also like