Ieee Paper PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Human pose estimation

Using Part affinity field and confidence maps


Abstract ----Here we represent the approach One of the common approaches is to use
of 2D human pose estimation by using part body detector and perform single person
affinity field and confidence map. Part pose estimation from each detection. In this
eedling
affinity is a parameter representation which
means to learn to associate body parts of
we use top down approach. This approach is
only useful if there is only person or, in
individuals from given images. multi-person case, people are in certain
Using greedy bottom-up parsing step we distance apart. This method will give error
maintain high accuracy while achieving real while identifying the body parts
-me performance.we design a architecture individually. Top-down approaches is
which jointly learn part locations and their proportional to the number of people: for
association via two branches of the same each detection, a single-person pose
sequential prediction process irrespective of estimator is run, and the more people there
the number of people in the image. are, the greater the computational cost.

However, Bottom up approach seems more


suitable for pose estimation of several
people in a single image. This approach is
I.INTRODUCTION used because it offer robustness to early
Human pose estimation is a problem of commitment and have the potential to
localizing human body parts of an decouple runtime complexity from the
individual. Finding out body parts of an number of people in the image.
individual in a image full of other human
beings is a great challenge to overcome.Here In this project, we present the first bottom-
we divide the problems into three category. up representation of association scores via
First , each image may contain an unknown Part Affinity Fields (PAFs), a set of 2D
number of people that can occur at any vector fields that encode the location and
position or scale. Second, interactions orientation of limbs over the image domain.
between people induce complex spatial We demonstrate that simultaneously
interference, due to contact, occlusion, and inferring these bottom-up representations of
limb articulations, making association of detection and association encode global
parts difficult. Third, run-time complexity context sufficiently well to allow a greedy
tends to grow with the number of people in parse to achieve high-quality results, at a
image,making real-time performance a fraction of the computational cost. We use
challenge. the code from the Github repository .
III. METHODS associated with other parts from the same
person—in other words, we need to find the
A. Datasets pairs of part detections that are in fact
We used COCO dataset for object connected limbs. We define a variable to
detection, segmentation, person key points indicate whether two detection candidates
detection, stuff segmentation, and caption are connected, and the goal is to find the
generation. optimal assignment for the set of all possible
connections. If we consider a single pair of
The system takes, as input, a color image of parts j1 and j2 (e.g., neck and right hip) for
size w × h and produces, as output, the 2D the c-th limb, finding the optimal association
locations of anatomical keypoints for each reduces to a maximum weight bipartite
person in the image. graph matching problem.

The above image shows Architecture of the


two-branch multi-stage CNN. Each stage in
the first branch predicts confidence maps St, The above flow diagram shows the whole
and each stage in the second branch predicts process of the system to identify key points
PAFs Lt. After each stage, the predictions of a person. The image is the input of this
from the two branches, along with the image project. The image (size w*h) is first analyzed by
features, are concatenated for next stage. a convolution network (initialized by the first 10
layers of VGG-19 model) and produces a set of
feature map. These feature maps are the input
of confidence map and part affinity fields.
Confidence map shows the joints where as part
. affinity shows the orientation of the joints. For
each joints of each person we have single
confidence map and part affinity field. Part
Confidence Maps, a set of 2D confidence
maps S for body part locations. Each joint
location has a map. Part Affinity Fields (PAFs), a
set of 2D vector fields L which encodes the
degree of association between parts. Finally,
Graph matching. (a) Original image with part the Confidence Maps and Part Affinity Fields are
detections (b) K-partite graph (c) Tree structure (d) A processed by a greedy algorithm to obtain the
set of bipartite graphs poses for each person in the image.
We first obtain a set of body part detection
candidates for multiple people. These part
detection candidates still need to be
B. The Architecture
The architecture of the network we used
for training is depicted in Figure 4. Our
model consists of only seven learned layers V. CONCLUSION
– five convolutional layers and followed by
two fully connected. The project has high runtime and when the
collision of the people is maximum then it
create an issue to separate the joints from
individual. For instance, in the dataset of a
couple dancing salsa style the collision of body
parts are maximum which means it will be
difficult to calculate the part affinity fields as
the limbs are almost closed to limbs of other
person.

To reduce the high runtime problem we use


Greedy bottom up parsing of multiple people.
Figure 2. (Left side) N-joint rigid kinematic And to overcome the second issues, i.e.
skeleton model.(Right side) Shape-based body maximum collision problem, we can use several
model.
cameras to get accurate co ordinates of the
joints and limbs from all the possible direction.

Greedy algorithm makes the optimal choice at


each step as it attempts to find the overall
C. Implementation optimal way to solve the entire problem. In this,
Figure 2 depicts the design of the setup after the part affinity fields is identified, Greedy
we employed for training. For coming up algorithm uses optimal way to join the joint so
with the deep neural network prototype,
that the joints are of same person. Here bottom
images are resized into 64 x 64 x 1
up parse helps to first join the joints and at the
GRAYSCALE images. It comprises seven
end combine all the joints to form a complete
learned layers – five convolutions and two
fully connected. body.

network classifier model is then used for


validation.
12. https://openaccess.thecvf.com/content
_cvpr_2013/papers/Dantone_Human_P
ose_Estimation_2013_CVPR_paper.pdf
REFERENCES
1. Realtime Multi-Person 2D Pose 13. https://sci-
Estimation using Part Affinity hub.tw/https://link.springer.com/chapt
Fields. er/10.1007/978-3-319-46475-6_16
https://openaccess.thecvf.com/conten
t_cvpr_2017/papers/Cao_Realtime_ 14. http://arno.uvt.nl/show.cgi?fid=148438
Multi-
Person_2D_CVPR_2017_paper.pdf 15. https://sci-
hub.tw/https://www.tandfonline.com/
2. http://image- doi/abs/10.1080/02640414.2018.15217
net.org/challenges/talks/2016/Multi- 69
person%20pose%20estimation-
CMU.pdf 16. https://github.com/ZheC/Realtime_Mul
3. Githubrepository: ti-Person_Pose_Estimation.git
https://github.com/ZheC/Realtime_
Multi-Person_Pose_Estimation 17. https://towardsdatascience.com/cvpr-
2017-openpose-realtime-multi-person-
4. https://journals.sagepub.com/doi/abs/
2d-pose-estimation-using-part-affinity-
10.1177/1747954119879350
fields-f2ce18d720e8
5. https://ieeexplore.ieee.org/abstract/do
cument/6691503

6. https://sci-hub.tw/10.1007/978-3-030-
17274-9

7. https://arxiv.org/abs/1611.08050

8. https://www.xyonix.com/blog/using-ai-
to-improve-sports-performance-amp-
achieve-better-running-efficiency

9. https://mc.ai/psg-sports-analytics-
challenge-how-deep-learning-can-be-
used-in-a-football-context/

10. https://sci-
hub.tw/10.1109/ICAIIT.2019.8834602

11. https://link.springer.com/chapter/10.1
007/978-3-319-46478-7_44

You might also like