Professional Documents
Culture Documents
CP302 Report
CP302 Report
RUPNAGAR-140001, INDIA
IIT Ropar
Submitted by
Kasif Ansari - 2020MEB1290
Lakshya Sharma - 2020MEB1292
Rahul Bansal - 2020MEB1305
Disaster Management refers to preventing, preparing for, responding to, and aiding in
emergency recovery efforts to allay the effects of natural or man-made disasters like
earthquakes, storms, or industry explosions. While a lot of efforts in Disaster management are
geared towards prevention and setting up appropriate safety measures, active response plays a
vital role in reducing the damages incurred once the disaster comes to pass. Effective
response plans require groundwork, subjecting disaster management teams to strenuous
environments and often mortal danger. Much of the hazard associated can be averted by the
use of robots.
Quadruped robots are a type of robots inspired by animals, which have 4-legs and offer high
mobility in uneven terrain and stability . A robot working in search and rescue should
complement the capabilities of human rescuers, providing additional support and enhancing
the overall response. It should be reliable, efficient, and effective, allowing for a more
coordinated and successful rescue operation. There are certain situations where it is not
possible for human rescuers to reach and help the people, at that time robots can be very
useful in providing medical support as well as delivering food supplies to the survivors.
In our project we want to develop a four legged robot for the assistance of human rescuers to
help the survivors during a disaster to provide them medical assistance and food supplies.
Thus Quadruped robots can carry payloads that require a stable platform, which can be an
important requirement while providing support to people stuck in hostile environments.
Quadruped robots are highly mobile in navigating through obstacles of comparable size,
stairs, and debris, making them ideal for urban environments. The wide base and low Center
of gravity makes them stable on uneven, sloping surfaces. They can adapt to a wide range of
environments. Most quadruped robots in production can carry loads in the range of 10-20 kg,
making them ideal for low-weight deliveries. The control algorithms required for quadruped
robots are generally more complex. High power consumption can be a challenge in remote
areas. They are usually expensive to develop and manufacture, limiting use in search and
rescue. Therefore we need to find a solution towards these complexities so that we can
manufacture robots which can be cost effective and more efficient than the previous
technological advancements that have been made in the field.
Objectives
Problem Description
Search and rescue operations in disaster management while being essential also pose
significant threat to those working in the team. The associated hazards can be significantly
reduced by the use of quadruped robots for operations in hostile terrains. While Quadruped
robots offer a range of features suited for search and rescue in disaster management, there are
certain challenges which act as a bottleneck to their wide scale adoption.
Hence, we need to design the robot to have the following attributes:
1. Mobility: It should be able to move over rough and uneven terrain, climb stairs, and
navigate through narrow spaces and debris. It should be able to move in low visibility
or unstable areas.
2. Durability: It should be able to handle impact, falls, and other hazards like radiation
and extreme temperatures without incurring significant damage.
3. Sensing and perception: It should be able to gather data on surroundings, detect
potential hazards and locate victims.
4. Manipulation and Interaction: It should be able to interact with the environment
during operation. The actions include providing medical assistance, and delivering
supplies.
5. Autonomy: It should be able to navigate and perform other essential operations like
information collection and relay without constant intervention.
6. Communication: The robot should be able to gather and transmit data collected
on-site back to the command center.
Overview
To achieve our objective we are developing a four legged robot integrated with tools like
reinforcement learning and analyzing the simulations done by using softwares like ROS and
MATLAB. Our first step is to prepare the cad model for the robot and carry out structural
simulations on the model. Then we will work on the control algorithms of the robot whose
simulations will be done on MATLAB. We have to design the control system in such a way
that it is self learning and then adjust itself accordingly. The development of reinforcement
algorithms is again a difficult task to implement with the controls. Our aim is to achieve
maximum output from the robot with minimal control.
Existing Studies
Taking a look at existing developments in the field, we have the following types of robots
that can be used for the operation:
1. Rovers
2. Snake Robots
3. 8-Legged Robots
4. Quadruped Robots
While each option has its advantages and disadvantages associated with it, making one more
suitable than others for a particular use case. We will be restricting our scope to the use of
Quadruped robots for their best-suited use-case, search, and delivery in rugged environments
that aren’t too narrow such as a pipe but are still too hostile for humans and too uneven for
rovers. These environments are often found in urban desolation, post-disaster rubble, and
industrial accidents.
The four-legged structure stands out as the perfect balance between 8-legged robots and
Rovers. While 8-legged robots are generally more agile, easier to operate, and thus offer an
excellent alternative for surveillance and Reconnaissance, 4-legged robots, while harder to
control, provide better stability.
We got 489 documents from Scopus Database related to our objective, we used only 112
documents from these for bibliometric analysis on the basis of Co-citation link score we got
the above chart with majorly 4 clusters.
Description- Based on a five-bar mechanism, this research presents a novel form of hybrid
leg mechanism. There are three degrees of freedom in the single leg. The diagonal gait was
chosen for the gait planning. ADAMS is used to validate the centroid displacement and foot
displacement of each gait leg.
Description- Look into reinforcement learning for a four-legged robot's walking behavior.
The paper presents an action selection technique for actor-critic algorithms in which the actor
uses the normal distribution to select a continuous action from its bounded action space.
Description- This research proposed a control technique for quadruped robots that ensures
gait tracking performance. The quadruped robot is unsteady during vigorous gait motions
such as trotting. In addition to parameter uncertainties and unmodeled dynamics, the
quadruped robot is constantly subjected to perturbations.
Methodology
Mechanical Design:
Each leg of the Robot has 3 Degrees of Freedom. Hence, the designed robot has a total 12
DOFs to allow proper movement and balancing.
1. A 2 DOF joint that enables revolutions in two directions. The joint will be controlled by
Servo motors for high precision and accuracy.
2. A single DOF joint that enables the robot to get over the obstacles and absorb any shock
or sudden changes in weights. This joint will also be powered by a single Servo Motor.
3. The image shows the 2 DOF motion of the main joints. This gives enhanced dexterity to
the robot, enabling it to reach and traverse difficult terrains with ease.
4. The heart of the 2 DOF joint is a modified version of a Universal Joint, that allows
rotatory motion in two directions
5. Balls bearing to support the joint assembly and to facilitate its rotation.
Control Algorithm
Reinforcement learning can be used efficiently to control physical systems like robots,
drones, or autonomous vehicles (RL). The core idea of RL is to acquire an ideal control
strategy through trial and error so that the system can achieve a certain objective, such as
walking, balancing, or avoiding obstacles.
Markov state: A state where the likelihood of changing to a different state depends only on
the present state and not on any previous states. In other words, the future is unrelated to the
past given the current situation.
As we have 12 motors in our system, our environment state will be
State= {Θ1, Θ2, Θ3, Θ4, Θ5, Θ6, Θ7, Θ8, Θ9, Θ10, Θ11, Θ12)
Where Θ1, Θ2, Θ3, Θ4, Θ5, Θ6, Θ7, Θ8, Θ9, Θ10, Θ11, Θ12 are the angles of motor w.r.t
initial condition.
We have gone through previous works and research papers, and we have concluded an
action-critic algorithm will suit best for this task.
As the state input, the vector (ΘI, Θ2,. . . , Θ12) represents the angular positions of 12 motors,
which are normalized as -1<= Θi>=1, where I = 1,2,...,12. The continuous state-space is
discretized into 28 hyper square cells in the critic, and the state is represented by a unit basis
vector (xI, x2,..., x256) of length = 28, with one component corresponding to the current state
being 1 and the others being 0.
A prominent RL algorithm that can be utilized for this purpose is the agent-critic algorithm.
The agent learns a policy that instructs it what behaviors to do in order to maximize its
reward in this algorithm. The critic assesses the agent's actions and provides feedback on how
effectively it is performing.
The agent in the context of a walking four-legged robot would be the machine learning
algorithm in charge of controlling the robot's movements. The critic, on the other hand, is in
charge of analyzing the agent's behaviors and offering comments on how effectively it is
performing. The critic gives the agent a numerical value that reflects how good or poor the
activity taken was.
The state-value function is updated using the temporal-difference (TD) learning algorithm to
compute the critic part of the actor-critic algorithm, which computes the difference between
the estimated value of the current state and the estimated value of the next state (
The critic's estimate of the state-value function is used to construct the gradient of the State
The actor's neural network is modified throughout training using stochastic gradient ascent
depending on the policy gradient. The update rule entails adding to the current policy
parameters a tiny multiple of the policy's gradient with respect to the projected cumulative
reward. A learning rate hyperparameter governs the size of the update.
Expected Results
By the end of this semester we are expecting to develop a virtual replica of the robot which
will be able to balance itself and walk by analyzing its moves. We will be able to finish the
structural simulations, the control algorithms will be developed using reinforcement learning
in MATLAB and we will be able to simulate the robot with the environment variables.
Furthermore we are looking to integrate computer vision algorithms for surrounding
detection.
Future Scope
As the robot perfects the basic operations through Reinforcement Learning over the
iterations, the functionality can then be extended to perform the more complex tasks like
object retrieval, shallow digging, dodging incoming projectiles, fully automated work cycles
etc.
The search and rescue robot can further be augmented with different mechanisms to serve
more versatile applications in fire-fighting, defense etc. While the programming part
develops to extend its functionality the design should evolve to be more modular to bring
down the manufacturing costs.Overall the hope for a wide-scale transformation in search &
rescue lies in making the robots safer, stronger, swift, more versatile and more affordable.
References
[PDF] Gait Tracking Control of Quadruped Robot Using Differential Evolution Based
Structure Specified Mixed Sensitivity H∞ Robust Control | Semantic Scholar
Home - SpotMicroAI
https://in.mathworks.com/help/reinforcement-learning/ug/quadruped-robot-locomotio
n-using-ddpg-gent.html