G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, “
Robot learning from demonstration by constructing skill trees,”
The International Journal of Robotics Research, vol. 31, no. 3, pp. 360–375, 2012.
AbstractWe describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved eciently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot-5 mobile manipulator.
cst-ijrr.pdf S. Kuindersma, R. Grupen, and A. Barto, “
Variable Risk Dynamic Mobile Manipulation,” in
RSS 2012 Workshop on Mobile Manipulation, Sydney, Australia, 2012.
AbstractThe ability to operate effectively in a variety of contexts will be a critical attribute of deployed mobile manipulators. In general, a variety of properties, such as battery charge, workspace constraints, and the presence of dangerous obstacles, will determine the suitability of particular control policies. Some context changes will cause shifts in risk sensitivity, or tendency to seek or avoid policies with high performance variation. We describe a policy search algorithm designed to address the problem of variable risk control. We generalize the simple stochastic gradient descent update to the risk-sensitive case, and show that, under certain conditions, it leads to an unbiased estimate of the gradient of the risk-sensitive objective. We show that the local critic structure used in the update can be exploited to interweave offline and online search to select local greedy policies or quickly change risk sensitivity. We evaluate the algorithm in experiments with a dynamically stable mobile manipulator lifting a heavy liquid-filled bottle while balancing.
vrmm12.pdf S. Kuindersma, R. Grupen, and A. Barto, “
Variational Bayesian Optimization for Runtime Risk-Sensitive Control,” in
Robotics: Science and Systems VIII (RSS), Sydney, Australia, 2012, pp. 201–206.
AbstractWe present a new Bayesian policy search algorithm suitable for problems with policy-dependent cost variance, a property present in many robot control tasks. We extend recent work on variational heteroscedastic Gaussian processes to the optimization case to achieve efficient minimization of very noisy cost signals. In contrast to most policy search algorithms, our method explicitly models the cost variance in regions of low expected cost and permits runtime adjustment of risk sensitivity without relearning. Our experiments with artificial systems and a real mobile manipulator demonstrate that flexible risk-sensitive policies can be learned in very few trials.
vbo.pdf