CP302 Report

DEPARTMENT OF MECHANICAL ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY ROPAR
RUPNAGAR-140001, INDIA
IIT Ropar
CAPSTONE PROJECT (CP302) REPORT

For
Development of Quadruped Robot For Disaster Management
with Reinforcement Learning
Project Instructor - Dr. Ekta Singla
Submitted by
Kasif Ansari - 2020MEB1290
Lakshya Sharma - 2020MEB1292
Rahul Bansal - 2020MEB1305
Second Semester, 2022-2023

Report Submitted on: 06-03-2023
Introduction
Disaster Management refers to preventing, preparing for, responding to, and aiding in
emergency recovery efforts to allay the effects of natural or man-made disasters like
earthquakes, storms, or industry explosions. While a lot of efforts in Disaster management are
geared towards prevention and setting up appropriate safety measures, active response plays a
vital role in reducing the damages incurred once the disaster comes to pass. Effective
response plans require groundwork, subjecting disaster management teams to strenuous
environments and often mortal danger. Much of the hazard associated can be averted by the
use of robots.
Quadruped robots are a type of robots inspired by animals, which have 4-legs and offer high
mobility in uneven terrain and stability . A robot working in search and rescue should
complement the capabilities of human rescuers, providing additional support and enhancing
the overall response. It should be reliable, efficient, and effective, allowing for a more
coordinated and successful rescue operation. There are certain situations where it is not
possible for human rescuers to reach and help the people, at that time robots can be very
useful in providing medical support as well as delivering food supplies to the survivors.
In our project we want to develop a four legged robot for the assistance of human rescuers to
help the survivors during a disaster to provide them medical assistance and food supplies.
Thus Quadruped robots can carry payloads that require a stable platform, which can be an
important requirement while providing support to people stuck in hostile environments.
Quadruped robots are highly mobile in navigating through obstacles of comparable size,
stairs, and debris, making them ideal for urban environments. The wide base and low Center
of gravity makes them stable on uneven, sloping surfaces. They can adapt to a wide range of
environments. Most quadruped robots in production can carry loads in the range of 10-20 kg,
making them ideal for low-weight deliveries. The control algorithms required for quadruped
robots are generally more complex. High power consumption can be a challenge in remote
areas. They are usually expensive to develop and manufacture, limiting use in search and
rescue. Therefore we need to find a solution towards these complexities so that we can
manufacture robots which can be cost effective and more efficient than the previous
technological advancements that have been made in the field.
Objectives
● To design and prototype a quadruped robot capable of moving in rough terrains

during disasters to help in search and rescue operations.
● To implement the reinforcement learning model in the prototype to train the robot in
various situations.
● To design an effective and simplified control system for the robot.
● To equip the robot with various detection systems so that it can help in search and
rescue operations during disaster management.
Problem Description
Search and rescue operations in disaster management while being essential also pose
significant threat to those working in the team. The associated hazards can be significantly
reduced by the use of quadruped robots for operations in hostile terrains. While Quadruped
robots offer a range of features suited for search and rescue in disaster management, there are
certain challenges which act as a bottleneck to their wide scale adoption.
Hence, we need to design the robot to have the following attributes:
1. Mobility: It should be able to move over rough and uneven terrain, climb stairs, and
navigate through narrow spaces and debris. It should be able to move in low visibility
or unstable areas.
2. Durability: It should be able to handle impact, falls, and other hazards like radiation
and extreme temperatures without incurring significant damage.
3. Sensing and perception: It should be able to gather data on surroundings, detect
potential hazards and locate victims.
4. Manipulation and Interaction: It should be able to interact with the environment
during operation. The actions include providing medical assistance, and delivering
supplies.
5. Autonomy: It should be able to navigate and perform other essential operations like
information collection and relay without constant intervention.
6. Communication: The robot should be able to gather and transmit data collected
on-site back to the command center.
Overview
To achieve our objective we are developing a four legged robot integrated with tools like
reinforcement learning and analyzing the simulations done by using softwares like ROS and
MATLAB. Our first step is to prepare the cad model for the robot and carry out structural
simulations on the model. Then we will work on the control algorithms of the robot whose
simulations will be done on MATLAB. We have to design the control system in such a way
that it is self learning and then adjust itself accordingly. The development of reinforcement
algorithms is again a difficult task to implement with the controls. Our aim is to achieve
maximum output from the robot with minimal control.
Existing Studies
Taking a look at existing developments in the field, we have the following types of robots
that can be used for the operation:
1. Rovers
2. Snake Robots
3. 8-Legged Robots
4. Quadruped Robots
While each option has its advantages and disadvantages associated with it, making one more
suitable than others for a particular use case. We will be restricting our scope to the use of
Quadruped robots for their best-suited use-case, search, and delivery in rugged environments
that aren’t too narrow such as a pipe but are still too hostile for humans and too uneven for
rovers. These environments are often found in urban desolation, post-disaster rubble, and
industrial accidents.
The four-legged structure stands out as the perfect balance between 8-legged robots and
Rovers. While 8-legged robots are generally more agile, easier to operate, and thus offer an
excellent alternative for surveillance and Reconnaissance, 4-legged robots, while harder to
control, provide better stability.
We got 489 documents from Scopus Database related to our objective, we used only 112
documents from these for bibliometric analysis on the basis of Co-citation link score we got
the above chart with majorly 4 clusters.
1. “Four-legged robot design and gait planning” - R C Liu1 , G Y Ma1 , Y Chen1 , S

Han1 , and J Gao1
Description- Based on a five-bar mechanism, this research presents a novel form of hybrid
leg mechanism. There are three degrees of freedom in the single leg. The diagonal gait was
chosen for the gait planning. ADAMS is used to validate the centroid displacement and foot
displacement of each gait leg.
2. “Reinforcement Learning of Walking Behavior for a Four-Legged Robot”- Hajime

Kimura, Tom Yamashita, Shigenobu Kobayashi.
Description- Look into reinforcement learning for a four-legged robot's walking behavior.
The paper presents an action selection technique for actor-critic algorithms in which the actor
uses the normal distribution to select a continuous action from its bounded action space.
3. “Forward Kinematics Serial Link Manipulators” - Dr. Ing John Nassour

Description- This study shows kinematic modeling of serial robot manipulators (open-chain
multibody systems) with a focus on forward and inverse kinematic models. The forward
kinematic model is built on rigid body conventions and includes one of the most widely used
techniques in robot kinematics, the Denavit-Hartenberg convention.
4. “Gait Tracking Control of Quadruped Robot Using Differential Evolution Based

Structure Specified Mixed Sensitivity H Robust Control” – Putrus Sutyasadi and
Manukid Parnichkun.
Description- This research proposed a control technique for quadruped robots that ensures
gait tracking performance. The quadruped robot is unsteady during vigorous gait motions
such as trotting. In addition to parameter uncertainties and unmodeled dynamics, the
quadruped robot is constantly subjected to perturbations.
Methodology
Mechanical Design:
Each leg of the Robot has 3 Degrees of Freedom. Hence, the designed robot has a total 12
DOFs to allow proper movement and balancing.
1. A 2 DOF joint that enables revolutions in two directions. The joint will be controlled by
Servo motors for high precision and accuracy.
2. A single DOF joint that enables the robot to get over the obstacles and absorb any shock
or sudden changes in weights. This joint will also be powered by a single Servo Motor.
3. The image shows the 2 DOF motion of the main joints. This gives enhanced dexterity to
the robot, enabling it to reach and traverse difficult terrains with ease.
4. The heart of the 2 DOF joint is a modified version of a Universal Joint, that allows
rotatory motion in two directions
5. Balls bearing to support the joint assembly and to facilitate its rotation.
Control Algorithm
We are utilizing Reinforcement learning, a branch of computer science, as a technique to

manage the physical system in order to accomplish our goal.
Reinforcement learning can be used efficiently to control physical systems like robots,
drones, or autonomous vehicles (RL). The core idea of RL is to acquire an ideal control
strategy through trial and error so that the system can achieve a certain objective, such as
walking, balancing, or avoiding obstacles.
Markov state: A state where the likelihood of changing to a different state depends only on
the present state and not on any previous states. In other words, the future is unrelated to the
past given the current situation.
As we have 12 motors in our system, our environment state will be
State= {Θ1, Θ2, Θ3, Θ4, Θ5, Θ6, Θ7, Θ8, Θ9, Θ10, Θ11, Θ12)
Where Θ1, Θ2, Θ3, Θ4, Θ5, Θ6, Θ7, Θ8, Θ9, Θ10, Θ11, Θ12 are the angles of motor w.r.t
initial condition.
We have gone through previous works and research papers, and we have concluded an
action-critic algorithm will suit best for this task.
As the state input, the vector (ΘI, Θ2,. . . , Θ12) represents the angular positions of 12 motors,
which are normalized as -1<= Θi>=1, where I = 1,2,...,12. The continuous state-space is
discretized into 28 hyper square cells in the critic, and the state is represented by a unit basis
vector (xI, x2,..., x256) of length = 28, with one component corresponding to the current state
being 1 and the others being 0.
A prominent RL algorithm that can be utilized for this purpose is the agent-critic algorithm.
The agent learns a policy that instructs it what behaviors to do in order to maximize its
reward in this algorithm. The critic assesses the agent's actions and provides feedback on how
effectively it is performing.
The agent in the context of a walking four-legged robot would be the machine learning
algorithm in charge of controlling the robot's movements. The critic, on the other hand, is in
charge of analyzing the agent's behaviors and offering comments on how effectively it is
performing. The critic gives the agent a numerical value that reflects how good or poor the
activity taken was.
The state-value function is updated using the temporal-difference (TD) learning algorithm to
compute the critic part of the actor-critic algorithm, which computes the difference between
the estimated value of the current state and the estimated value of the next state (
), updating the value function accordingly. The deep Q-network

(DQN) algorithm is used to train the critic's neural network, which is a version of the
Q-learning technique. This entails storing experience tuples in a replay buffer and training the
critic's neural network with mini-batch gradient descent. The neural network is updated with
a loss function ( ) that minimizes the difference between the estimated

state-value function and the goal-value function, which is calculated with the reward and the
estimated value of the next state.
The actor in the actor-critic algorithm is responsible for learning a policy that maps states to
actions. This is often accomplished with the use of a neural network, the output of which
represents a probability distribution across the available actions for a particular condition.
The policy gradient algorithm is used to train the actor's neural network, which entails
computing the gradient of the predicted cumulative reward with respect to the policy
parameters and updating the policy parameters via gradient ascent ( ).
The critic's estimate of the state-value function is used to construct the gradient of the State
value function ( ). The gradient is calculated by multiplying the
log-likelihood of the actor's action ( ) by the estimated

advantage of taking that action in the present state. The advantage is defined as the difference
between the estimated state-value function and the estimated value of acting in the present
state.
The actor's neural network is modified throughout training using stochastic gradient ascent
depending on the policy gradient. The update rule entails adding to the current policy
parameters a tiny multiple of the policy's gradient with respect to the projected cumulative
reward. A learning rate hyperparameter governs the size of the update.
Expected Results
By the end of this semester we are expecting to develop a virtual replica of the robot which
will be able to balance itself and walk by analyzing its moves. We will be able to finish the
structural simulations, the control algorithms will be developed using reinforcement learning
in MATLAB and we will be able to simulate the robot with the environment variables.
Furthermore we are looking to integrate computer vision algorithms for surrounding
detection.
Future Scope
As the robot perfects the basic operations through Reinforcement Learning over the
iterations, the functionality can then be extended to perform the more complex tasks like
object retrieval, shallow digging, dodging incoming projectiles, fully automated work cycles
etc.
The search and rescue robot can further be augmented with different mechanisms to serve
more versatile applications in fire-fighting, defense etc. While the programming part
develops to extend its functionality the design should evolve to be more modular to bring
down the manufacturing costs.Overall the hope for a wide-scale transformation in search &
rescue lies in making the robots safer, stronger, swift, more versatile and more affordable.
References
(PDF) Quadruped robot - Four Legged Robot (researchgate.net)
(PDF) Four-legged robot design and gait planning (researchgate.net)
Reinforcement learning of walking behavior for a four-legged robot | IEEE Conference

Publication | IEEE Xplore
[PDF] Gait Tracking Control of Quadruped Robot Using Differential Evolution Based
Structure Specified Mixed Sensitivity H∞ Robust Control | Semantic Scholar
Home - SpotMicroAI
https://in.mathworks.com/help/reinforcement-learning/ug/quadruped-robot-locomotio
n-using-ddpg-gent.html

CP302 Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CP302 Report

Uploaded by

Copyright:

Available Formats

DEPARTMENT OF MECHANICAL ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY ROPAR

CAPSTONE PROJECT (CP302) REPORT

Project Instructor - Dr. Ekta Singla

Second Semester, 2022-2023

● To design and prototype a quadruped robot capable of moving in rough terrains

1. “Four-legged robot design and gait planning” - R C Liu1 , G Y Ma1 , Y Chen1 , S

2. “Reinforcement Learning of Walking Behavior for a Four-Legged Robot”- Hajime

3. “Forward Kinematics Serial Link Manipulators” - Dr. Ing John Nassour

4. “Gait Tracking Control of Quadruped Robot Using Differential Evolution Based

We are utilizing Reinforcement learning, a branch of computer science, as a technique to

), updating the value function accordingly. The deep Q-network

a loss function ( ) that minimizes the difference between the estimated

parameters and updating the policy parameters via gradient ascent ( ).

value function ( ). The gradient is calculated by multiplying the

log-likelihood of the actor's action ( ) by the estimated

(PDF) Quadruped robot - Four Legged Robot (researchgate.net)

(PDF) Four-legged robot design and gait planning (researchgate.net)

Reinforcement learning of walking behavior for a four-legged robot | IEEE Conference

You might also like