3
Task Representation

This chapter reviews approaches for task representation in robot programming by demonstration (PbD). Based on the level of task abstraction, the methods are categorized into high‐level task representation at the symbolic level of abstraction and low‐level task representation at the trajectory level of abstraction. Techniques for data preprocessing related to trajectories scaling and aligning are also discussed in the chapter.

The PbD framework aims at learning from multiple demonstrations of a skill performed under similar conditions. For a set of M demonstrations, the perception data are denoted by images, where m is used for indexing the demonstrations, t is used for indexing the measurements within each demonstration, and Tm denotes the total number of measurements of the demonstration sequence Xm. Each measurement represents a D‐dimensional vector images. The form of the measurements depends on the data acquisition system(s) employed for perception of the demonstrations, and it can encompass the following:

  1. Cartesian poses (positions and orientations) and/or velocities/accelerations of manipulated objects, tools, or the demonstrator’s hand.
  2. Joint angles positions and/or velocities/accelerations of the demonstrator’s arm.
  3. Forces exerted on the environment by the demonstrated actions.
  4. Sequence of images of the scene in the case of vision‐based perception.

The data structure for the task perception with optical sensors and vision cameras is presented in Sections 2.1 and 2.2.

The objective of PbD is to map the demonstrated space into a sequence of low‐level command signals for a robot learner to reproduce the demonstrated task. As outlined in Chapter 2, the mapping is often highly nonlinear, and therefore, it represents a challenging problem. It usually consists of several steps. The task modeling and task analysis phases encode the recorded data into compact and flexible representation of demonstrated motions and extract the relevant task features for achieving the required robot performance. In the task planning step, a trajectory for task reproduction is derived by generalization over the dataset of demonstrated task examples. The generalized trajectory for task reproduction is denoted by images, where Tgen is the number of time measurements of the generalized trajectory. Afterward, the generalized trajectory is transferred to robot commands and loaded on the robot platform for the task execution in the real environment.

3.1 Level of Abstraction

Representation of observed tasks in robot PbD is often categorized based on the level of task abstraction into high‐level and low‐level representation (Schaal et al., 2003). As elaborated in Section 1.4.2, high‐level task representation refers to encoding tasks as a hierarchical sequence of high‐level behaviors (Dillmann, 2004; Saunders et al., 2006). This representation category is also known as symbolic task representation. In general, the elementary behaviors are predefined, and the observed tasks are initially segmented into sequences of behavior. The task representation then involves defining rules and conditions related to the state of the required world for each elementary action to occur, as well as defining rules and required conditions for each elementary action to end and the successor behavior to begin. Among the principal drawbacks for such task representation approach is the limitation of relying on a library of predefined elementary behaviors in encoding tasks, and the limited applicability for representing tasks requiring continuous high accuracy along a three‐dimensional (3D) path or a velocity profile.

Low‐level task representation is also referred to as representation at a trajectory level, where the task demonstrations are encoded as continuous 3D trajectories in the Cartesian space or as continuous angle trajectories in the joint space (Section 1.4.2.2). The PbD approaches presented in the book employ this type of task abstraction, due to the ability to represent arbitrary motions, as well as to define the spatial constraints of the demonstrations (Aleotti and Caselli, 2005; Calinon, 2009). Additionally, low‐level task representation enables to specify the velocities and accelerations at different phases of the demonstrated task, and allows encoding tasks with high‐accuracy demand.

3.2 Probabilistic Learning

Statistical methods have been widely used in robotics for representing the uncertain information about the state of the environment. On one hand, processing of the robot’s sensory data involves handling the uncertainties that originate from sensors noise, limitations of the sensors, and unpredictable changes in the environment. On the other hand, processing the control actions involves tackling the uncertainties regarding the way the decisions are made to achieve the desired level of motor performance. The statistical frameworks represent the uncertainties of robot’s perception and action via probability distributions, instead of using a single best guess about the state of the world. As a result, the use of statistical models has contributed to improved robustness and performance in many robotic applications, such as localization and navigation of mobile robots, planning, and map generation from sensory data (Thrun, 2000).

Regarding robot learning from observation of human demonstrations, the theory of statistical modeling has been exploited for representing the uncertainties of the acquired perceptual data (Calinon, 2009). Indeed, one must bear in mind that a fundamental property of the human movements is their random nature. Humans are not capable of drawing perfect straight lines or repeating identical movements, which is assumed is due to the inherent stochastic nature in the neural information processing of the required actions (Clamann, 1969). The statistical algorithms provide a form to encapsulate the random variations in the observed demonstrations by deriving the probability distributions of the outcomes from several repeated measurements. Consequently, a model of a human skill is built from several examples of the same skill demonstrated in a similar fashion and under similar conditions. The underlying variability across the repeated demonstrations is utilized to probabilistically represent the different components of the task, and subsequently, to retrieve a generalized version of the demonstrated trajectories.

Within the published literature, hidden Markov model (HMM) has been established as one of the most popular statistical methods for modeling of human motions. Other approaches that have been used for probabilistic skill encoding include the following: Gaussian mixture model (GMM) (Calinon, 2009), support vector machines (Zollner et al., 2002; Martinez and Kragic, 2008), and Bayesian belief networks (Coates et al., 2008; Grimes and Rao, 2008).

3.3 Data Scaling and Aligning

Generalization from multiple task examples imposes the need to address the problem of temporal variability in the demonstrated data. Often, it is required to initially scale the demonstrated data to sequences with equal length before proceeding with the data analysis. The approaches of linear scaling and dynamic time warping (DTW) scaling have been employed in many works for this purpose (Pomplun and Mataric, 2000; Gribovskaya and Billard, 2008; Ijspeert et al., 2012).

3.3.1 Linear Scaling

Linear scaling refers to the change in the number of measurements of a sequence through interpolation between the sequence data. For a set of collected demonstration trajectories with a different number of time frames, scaling to a new set of sequences with equal number of time frames is achieved by performing linear scaling of the time vectors for each individual trajectory. This procedure is equivalent to extending or shortening the sequences to the required number of time measurements. Among different types of interpolation techniques for time series data, polynomial interpolation is the most often used. More specifically, low‐order polynomials are locally fitted to create a smooth and continuous function, referred to as a spline.

For illustration, Figure 3.1a shows two sample trajectories with different number of measurements, whereas Figure 3.1b displays their counterparts after the linear time scaling. Accordingly, the length of the test trajectory is scaled to the number of measurements of the reference trajectory. For sequences that differ significantly in length, this method might not be very efficient, since the temporal variations in the demonstrations can result in spatial misalignments across the set.

Top: Graph of 2 sequences with different number of measurements (600 and 800 measurement data points). Middle: Graph of a linearly scaled test sequence. Bottom: Graph of an aligned test sequence using DTW.

Figure 3.1 (a) Two sequences with different number of measurements: a reference sequence of 600 measurement data points and a test sequence of 800 measurement data points; (b) the test sequence is linearly scaled to the same number of measurements as the reference sequence; and (c) the test sequence is aligned with the reference sequence using DTW.

3.3.2 Dynamic Time Warping (DTW)

DTW is an algorithm for comparing and aligning time series, where finding an optimal alignment of sequences is based on a nonlinear time warping by using a dynamic programming technique. The DTW alignment for the two sequences from Figure 3.1a is illustrated in Figure 3.1c. It can thus be concluded that the main advantage of the DTW scaling over the linear time scaling is the efficient alignment of the signals for handling the spatial distortions.

The DTW sequence alignment is based on forming a matrix of distances between two time series, and finding an optimal path through the matrix that locally minimizes the distance between the sequences. For a given reference sequence images of length Tχ and a test sequence images of length Tγ, the distance matrix H is formed as follows:

In (3.1), the notation images is used for denoting the Euclidean l2‐norm. Note that other norms can also be used as a distance measure, such as l1‐norm (Holt et al., 2007). The optimal alignment path images is a function of the elements of the matrix H with images. The path is calculated by minimizing the cumulative sum of the distances H(χ, γ) and the minimum distance between the test and reference sequences of the neighboring cells, i.e.,

The procedure for implementing DTW includes the following steps:

  1. Form the distance matrix of dimensions images for given reference and test sequences using (3.1).
  2. Initialize the cumulative distance images.
  3. For images and images, calculate the cumulative distances g(χ, γ) according to (3.2).
  4. Backtrack the path images to find the indices of the warped sequence.

The following constraints are enforced for the warping path:

  • Boundary conditions: images and images.
  • Continuity conditions: if images, then images for images and images.
  • Monotonicity condition: if images, then images for images and images.

The boundary conditions define the starting and ending points of the path, whereas the continuity and monotonicity conditions constrain the path to be continuous and monotonically spaced in time, respectively.

The DTW alignment can excessively distort the signals in the case of dissimilar sequences, therefore shape-preserving constraints are often imposed. Sakoe and Chiba (1978) proposed to force the warped time waves to stay within a specified time frame window of their normalized time vector, and to impose a constraint on the slope of the warping path.

1D DTW refers to alignment of time series based on a single dimension of the data. When dealing with multidimensional data, it might result in suboptimal alignment across the rest of the dimensions. Multidimensional DTW approach was proposed by Holt et al. (2007), where the alignment of the sequences involves the coordinates from all dimensions of the data. Hence, the distance matrix includes the Euclidean distance among the dimensions of the sequences at each time point, and has the following form:

(3.3)images

where d denotes the dimensionality of the sequences.

3.4 Summary

Chapter 3 begins with the formulation of the problem of learning trajectories in a PbD setting. Further, methods for task representation at a high‐level and low‐level of abstraction are covered. A brief overview of statistical methods for task representation is presented. Their application in robot learning from multiple demonstrations is intuitive and natural with consideration to the stochastic character of human motions. The chapter concludes with a discussion of two standard approaches employed for tackling the temporal variations of human‐demonstrated trajectories.

References

  1. Aleotti, J., and Caselli, S., (2005). Trajectory clustering and stochastic approximation for robot programming by demonstration. Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, Canada, pp. 1029–1034.
  2. Calinon, S., (2009). Robot Programming by Demonstration: A Probabilistic Approach. Boca Raton, USA: EPFL/CRC Press.
  3. Clamann, H.P., (1969). Statistical analysis of motor unit firing patterns in a human skeletal muscle. Journal of Biophysics, vol. 9, no. 10, pp. 1223–1251.
  4. Coates, A., Abbeel, P., and Ng, A.Y., (2008). Learning for control from multiple demonstrations. Proceedings of International Conference on Machine Learning, Helsinki, Finland, pp. 144–151.
  5. Dillmann, R., (2004). Teaching and learning of robot tasks via observation of human performance. Robotics and Autonomous Systems, vol. 47, no. 2–3, pp. 109–116.
  6. Gribovskaya, E., and Billard, A., (2008). Combining dynamical systems control and programming by demonstration for teaching discrete bimanual coordination tasks to a humanoid robot. Proceedings of ACME/IEEE International Conference on Human‐Robot Interaction, Amsterdam, the Netherlands, pp. 1–8.
  7. Grimes, D.B., and Rao, R.P.N., (2008). Learning nonparametric policies by imitation. Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, pp. 2022–2028.
  8. Holt, G.A.T., Reinders, M.J.T., and Hendriks, E.A., (2007). Multi‐dimensional dynamic time warping for gesture recognition. Proceedings of 13th Annual Conference of the Advanced School for Computing and Imaging, Heijen, the Netherlands, pp. 1–8.
  9. Ijspeert, A.J., Nakanishi, J., Hoffmann, H., Pastor, P., and Schaal, S., (2012). Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, vol. 25, no. 2, pp. 328–373.
  10. Martinez, D., and Kragic, D., (2008). Modeling and recognition of actions through motor primitives. Proceedings of IEEE International Conference on Robotics and Automation, Pasadena, USA, pp. 1704–1709.
  11. Pomplun, M., and Mataric, M.J., (2000). Evaluation metrics and results of human arm movement imitation. Proceedings of First IEEE‐RAS International Conference on Humanoid Robotics, Cambridge, USA, pp. 1–8.
  12. Sakoe, H., and Chiba, S., (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43–49.
  13. Saunders, J., Nehaniv, C.L., and Dautenhahn, K., (2006). Teaching robots by moulding behavior and scaffolding the environment. Proceedings of ACM/IEEE International Conference on Human‐Robot Interaction, Salt Lake City, USA, pp. 118–125.
  14. Schaal, S., Ijspeert, A., and Billard, A., (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London: Biological Sciences, vol. 358, no. 1431, pp. 537–547.
  15. Thrun, S., (2000). Probabilistic algorithms in robotics. AI Magazine, vol. 21, no. 4, pp. 93–109.
  16. Zollner, R., Rogalla, O., Dillmann, R., and Zollner, M., (2002). Understanding users intentions: programming fine manipulation tasks by demonstration. Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, pp. 1114–1119.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset