Professional Documents
Culture Documents
Learninghapticrepresentation Fooditems 2014
Learninghapticrepresentation Fooditems 2014
I. I NTRODUCTION
One important class of objects that a personal robot would
need to learn about is deformable semi-solid materials such
as food. It is important for the robot to model and estimate
their physical properties to be able to manipulate them.
Approaches based on vision (e.g., [1]) are limited because
even the same types of food objects (e.g., two different
bananas), which have similar visual properties, could have
vastly varying material properties, depending on the conditions that they are treated at (e.g., temperature or humidity),
their age (e.g., raw or ripe), or how they were processed (e.g.,
cooked). In many cases, these small differences can change
the correct way to manipulate these objects. In this paper, we
focus on using haptic data from the robots internal sensors
to perceive and manipulate food objects.
Although physical properties of objects are important
for manipulation planning, mathematically modeling these
properties in an explicit manner is very challenging for
many object categories and requires a lot of design effort
[2]. A feasible alternative is to learn representations from
haptic data where the agent maps its sensory inputs from its
actuators to physical properties of objects.
In this work, we propose a learning algorithm (see Fig. 2)
that takes as input haptic sensory signals of a robot obtained
while performing actions on the food objects. We then
extract useful features from these haptic signals and map
them to physical properties of the objects through supervised
learning. Using this mapping, we then perform unsupervised
learning and clustering based on Diritchlet Processes (DP)
to represent the physical properties compactly. Through this
representation, we integrate multiple observations about an
objects properties obtained at different times into a compact
Fig. 1: We use haptic sensors of a personal robot to learn a datadriven representation of the physical properties of food objects. We
use learning techniques to create manipulation strategies involving
actions that use household tools in order to prepare meals.
Fig. 4: Actions designed to gather information about the physical properties of objects. Arrows indicate the direction of the applied
forces in the first and second motions for the actions. For each motion, we indicate the most important physical properties that the action
is designed to extract. For example, in action (b), we desire to measure the tensile strength of an object as we apply an outward force (1)
after piercing an object. Similarly, while removing a knife (2) from the object in action (d) we desire to determine whether the object is
adhesive or not.
TABLE I: Haptic Inputs and Feature Sets.
# Inputs
7
3
4
15
7
7
1
29
3
3
23
Fig. 5: Feature Extraction: On the left, the Deviation in joint efforts (Table I entry E3) shown as Lateral Split with Forks action is
executed on different object categories (middle). This haptic input has a higher low-frequency content for bagel category (top) and has
a higher high-frequency content for tomato category (middle) compared to cream cheese category (bottom). On the right side, we show
our binning strategy where each bin collects the power content by integrating its respective interval.
manipulated.
Once we obtain the spectrum of a haptic input, we create
the corresponding features by computing a 5-bin histogram
integrated between different frequencies (uniform in logscale between 0 16Hz). Additionally, we compute another
4-bin histogram where the bins accumulate the power content
until the frequency they represent (see right side of Figure 5).
Cumulative binning helps in capturing various nonlinear
functions of the original power content.
For the rest of this paper, we will represent this feature
vector as ai IR9h , where a is the action executed to
obtain haptic inputs from instance i and h is the number
of inputs used in learning which depends on the input sets
chosen (first column of Table I). For example, if we use all
the haptic inputs (C+E+D), then h = 2 (7 + 15 + 29) and
has 918 features (for actions that use both grippers). High
dimensionality of the feature space for some sets of inputs
motivates us to use learning algorithms that are able to cope
with such data. We discuss our approach for mapping the
computed features to physical properties of objects in the
next section.
VI. L EARNING
A. Dataset
For our supervised learning task, we used three human experts (including one of the authors) to label the properties of
the food categories involved in training (all categories from
Fig. 3 except tofu). Although it is a hard task for a human
to reliably estimate the absolute physical properties of food
categories (e.g., average elasticity modulus of cream cheese
in N/m2 ), ordering these food categories according to their
physical properties is relatively easier and disagreements on
the labelings are arguably far less common.
In order to label the food categories, human experts are
asked to order them for each pyhsical property independently.
In this ordering, they are allowed to have equivalence relationships between two categories for individual physical
properties as well. Once the ordering is decided for a
property, it is projected to the [0, 1] interval linearly and
the labels for that property are determined. For example,
Vert.
Piercing
.17
.14
.11
.11
.09
.09
.06
C+E+D
Lat.
Splitting
.14
.13
.11
.10
.05
.08
.06
C+D
Circ.
Bending
.15
.13
.07
.09
.06
.09
.09
C+D
Vert. Cut
w/ Knife
.19
.18
.09
.14
.09
.11
.09
C+E+D
Best
Action
.14
.13
.07
.09
.05
.08
.06
Fig. 7: Haptic categories learned (bottom) from estimated properties of training samples (top) through DP for the Lateral Split
action. Best viewed in color.
= z|z
ni
z
N 1+ P(yi ; z )
N 1+
z already Z
otherwise
(1)
m
1
|
|
T)
z
+
(
[y
]
[y
]
+
z
+
z
z
e
(2)
wz = (wz ) (m+1)
m
where m is the number observations used for determining
the previous belief b, is the decay parameter for previous
observations (chosen to be 0.9) and w+ is the new mixture weight vector after the update which we normalize to
maintain 1T w+ = 1.
We show the belief update in Fig. 8 as actions Vertical
Pierce and Vertical Cut are performed on a tofu object
consecutively. The initial belief (column 1) has a uniform
weight for each class obtained in DP. This corresponds to
a prior probability obtained from previously seen examples.
Therefore, the belief update starts with this prior and not the
naive uniform distribution over the whole property space.
As observations are obtained in 2nd column and 3rd column
in order, the belief changes towards an object with very low
hardness, high brittleness and high plasticity which is correct
for tofu. Once the robot reaches the belief in column 3, it
can use this information for choosing the correct way to
manipulate this object. For example, it can move the object
with spatulas instead of its grippers, since an object with high
plasticity and brittleness may be deformed when grasped or
damage the grippers.
In the example above, the tofu category is not seen during
the training stage and therefore Z cannot include a haptic
category directly representing the tofu category. However,
because there are similar categories of food objects seen
during training such as cream cheese and peeled banana, it
is possible to obtain a good representation even for tofu as a
combination of haptic categories representing the properties
of these other food categories.
Subgoal
Trials
Success
Failure
Flipping
9
8
1
Moving
7
6
1
Splitting
15
15
0
Cutting
29
27
2
All
60
56
4
Fig. 12: Results showing PR2 making a salad. (Left) Unprepared food objects, (Middle) PR2 performing manipulation actions after
learning about objects, and (Right) prepared salad.
VIII. C ONCLUSIONS
In this work, we have presented a learning algorithm for
manipulating food objects with complex physical properties. Our robot models the properties through haptic inputs,
computes the features, learns haptic categories and executes
correct manipulation actions to reach useful goals. Using
the learned belief representation and manipulation strategies,
our robot attained an overall success rate of 93% in robotic
experiments.
ACKNOWLEDGEMENTS
This work was supported by ARO award W911NF- 12-10267, and by Microsoft Faculty Fellowship and NSF Career
Award (to Saxena).
R EFERENCES
[1] S. Yang, M. Chen, D. Pomerleau, , and R. Sukthankar, Food recognition using statistics of pairwise local features, in CVPR, 2010.
[2] L. Kunze, M. E. Dolha, E. Guzman, and M. Beetz, Simulationbased temporal projection of everyday robot object manipulation, in
AAMAS, 2011.
[3] A. Saxena, J. Driemeyer, and A. Ng, Robotic grasping of novel
objects using vision, IJRR, vol. 27, no. 2, p. 157, 2008.
[4] K. Hsiao, P. Nangeroni, M. Huber, A. Saxena, and A. Ng, Reactive
grasping using optical proximity sensors, in ICRA, 2009.
[5] D. Katz, Y. Pyuro, and O. Brock, Learning to manipulate articulated
objects in unstructured environments using a grounded relational
representation, in RSS, 2008.
[6] A. Jain and C. Kemp, Improving robot manipulation with datadriven object-centric models of everyday forces, Autonomous Robots,
vol. 35, no. 2-3, pp. 143159, 2013.
[7] J. Maitin-Shepard, M. Cusumano-Towner, J. Lei, and P. Abbeel, Cloth
Grasp Point Detection based on Multiple-View Geometric Cues with
Application to Robotic Towel Folding, in ICRA, 2010.
[8] K. Hsiao, L. Kaelbling, and T. Lozano-Perez, Task-driven tactile
exploration, in RSS, 2010.
[9] A. Jain, M. D. Killpack, A. Edsinger, and C. C. Kemp, Reaching in
clutter with whole-arm tactile sensing, IJRR, vol. 32, no. 4, 2013.
[10] K. Hawkins, C. King, T. Chen, and C. Kemp, Informing assistive
robots with models of contact forces from able-bodied face wiping
and shaving, in RO-MAN, 2012.
[11] H. Yussof, M. Ohka, J. Takata, Y. Nasu, and M. Yamano, Low force
control scheme for object hardness distinction in robot manipulation
based on tactile sensing., in ICRA, 2008.