Planning locomotion trajectories for legged microrobots is challenging. This is because of their complex morphology, high frequency passive dynamics, and discontinuous contact interactions with their environment. Consequently, such research is often driven by time-consuming experimental methods. As an alternative, we present a framework for systematically modeling, planning, and controlling legged microrobots. We develop a three- dimensional dynamic model of a 1.5 g quadrupedal microrobot with complexity (e.g., number of degrees of freedom) similar to larger-scale legged robots. We then adapt a recently developed variational contact-implicit trajectory optimization method to generate feasible whole-body locomotion plans for this microrobot, and demonstrate that these plans can be tracked with simple joint-space controllers. We plan and execute periodic gaits at multiple stride frequencies and on various surfaces. These gaits achieve high per-cycle velocities, including a maximum of 10.87 mm/cycle, which is 15% faster than previously measured for this microrobot. Furthermore, we plan and execute a vertical jump of 9.96 mm, which is 78% of the microrobot’s center-of- mass height. To the best of our knowledge, this is the first end-to-end demonstration of planning and tracking whole-body dynamic locomotion on a millimeter-scale legged microrobot.

%B Robotics: Science and Systems (RSS) %G eng %0 Journal Article %J Science Robotics %D 2018 %T Human-in-the-loop optimization of hip assistance with a soft exosuit during walking %A Ye Ding %A Kim, Myunghee %A Scott Kuindersma %A Walsh, Conor J. %X Wearable robotic devices have been shown to substantially reduce the energy expenditure of human walking. However, response variance between participants for fixed control strategies can be high, leading to the hypothesis that individualized controllers could further improve walking economy. Recent studies on human-in-the-loop (HIL) control optimization have elucidated several practical challenges, such as long experimental protocols and low signal-to-noise ratios. Here, we used Bayesian optimization—an algorithm well suited to optimizing noisy performance signals with very limited data—to identify the peak and offset timing of hip extension assistance that minimizes the energy expenditure of walking with a textile-based wearable device. Optimal peak and offset timing were found over an average of 21.4 ± 1.0 min and reduced metabolic cost by 17.4 ± 3.2% compared with walking without the device (mean ± SEM), which represents an improvement of more than 60% on metabolic reduction compared with state-of-the-art devices that only assist hip extension. In addition, our results provide evidence for participant-specific metabolic distributions with respect to peak and offset timing and metabolic landscapes, lending support to the hypothesis that individualized control strategies can offer substantial benefits over fixed control strategies. These results also suggest that this method could have practical impact on improving the performance of wearable robotic devices. %B Science Robotics %V 3 %P eaar5438 %G eng %U http://robotics.sciencemag.org/content/3/15/eaar5438 %N 15 %0 Journal Article %J Autonomous Robots %D 2018 %T Robust Direct Trajectory Optimization Using Approximate Invariant Funnels %A Zachary Manchester %A Scott Kuindersma %X Many critical robotics applications require robustness to disturbances arising from unplanned forces, state uncertainty, and model errors. Motion planning algorithms that explicitly reason about robustness require a coupling of trajectory optimization and feedback design, where the system's closed-loop response to disturbances is optimized. Due to the often-heavy computational demands of solving such problems, the practical application of robust trajectory optimization in robotics has so far been limited. Motivated by recent work on sums-of-squares verification methods for nonlinear systems, we derive a scalable robust trajectory optimization algorithm that optimizes approximate invariant funnels along the trajectory while planning. For the case of ellipsoidal disturbance sets and LQR feedback controllers, the state and input deviations along a nominal trajectory can be computed locally in closed form, permitting fast evaluation of robust cost and constraint functions and their derivatives. The resulting algorithm is a scalable extension of classical direct transcription that demonstrably improves tracking performance over non-robust formulations while incurring only a modest increase in computational cost. We evaluate the algorithm in several simulated robot control tasks. %B Autonomous Robots %G eng %U http://em.rdcu.be/wf/click?upn=lMZy1lernSJ7apc5DgYM8e24xrYhSFK5X6cE3-2BZib5M-3D_Me-2F-2BXTdGod-2FpQrhDtHpJFnHIppVTOp0teMLWFtYuVbLH9VkVs6ZHPeOVCZtrPQefHbFmWSDdMZ4k7EiE9-2F20-2FmhTyIJEHaAGPPECVx6ytpZ6rtS4Uhd9AUtjPie4uLbSHsoE185-2B9jap0vKqAicYw33lti714dqsrYx %0 Journal Article %J PLoS ONE %D 2017 %T Human-in-the-loop Bayesian optimization of wearable device parameters %A Kim, Myunghee %A Ye Ding %A Philippe Malcolm %A Jozefien Speeckaert %A Siviy, Christoper J. %A Walsh, Conor J. %A Scott Kuindersma %X The increasing capabilities of exoskeletons and powered prosthetics for walking assistance have paved the way for more sophisticated and individualized control strategies. In response to this opportunity, recent work on *human-in-the-loop* optimization has considered the problem of automatically tuning control parameters based on realtime physiological measurements. However, the common use of metabolic cost as a performance metric creates significant experimental challenges due to its long measurement times and low signal-to-noise ratio. We evaluate the use of Bayesian optimization—a family of sample-efficient, noise-tolerant, and global optimization methods—for quickly identifying near-optimal control parameters. To manage experimental complexity and provide comparisons against related work, we consider the task of minimizing metabolic cost by optimizing walking step frequencies in unaided human subjects. Compared to an existing approach based on gradient descent, Bayesian optimization identified a near-optimal step frequency with a faster time to convergence (12 minutes, *p* < 0.01), smaller inter-subject variability in convergence time (± 2 minutes, *p* < 0.01), and lower overall energy expenditure (*p* < 0.01).

Many critical robotics applications require robustness to disturbances arising from unplanned forces, state uncertainty, and model errors. Motion planning algorithms that explicitly reason about robustness require a coupling of trajectory optimization and feedback design, where the system's closed-loop response to bounded disturbances is optimized. Due to the often-heavy computational demands of solving such problems, the practical application of robust trajectory optimization in robotics has so far been limited. We derive a tractable robust optimization algorithm that combines direct transcription with linear-quadratic control design to reason about closed-loop responses to disturbances. In the case of ellipsoidal disturbance sets, the state and input deviations along a nominal trajectory can be computed locally in closed form, thus allowing for fast evaluations of robust cost and constraint functions. The resulting algorithm, called DIRTREL, is an extension of classical direct transcription that demonstrably improves tracking performance over non-robust formulations while incurring only a modest increase in computational cost. We evaluate the algorithm in several simulated robot control tasks.

%B Robotics: Science and Systems (RSS) %G eng %0 Conference Paper %B 55th AIAA Aerospace Sciences Meeting, AIAA SciTech Forum %D 2017 %T A Variable Forward-Sweep Wing Design for Improved Perching in Micro Aerial Vehicles %A Zachary R. Manchester %A Jeffrey I. Lipton %A Wood, Robert J. %A Scott Kuindersma %XA micro aerial vehicle with a variable forward-sweep wing is proposed with the goal of enhancing performance and controllability during high-angle-of-attack perching maneuvers. Data is presented from a series of wind tunnel experiments to quantify the aerodynamic effects of forward sweep over a range of angles of attack from -25 degrees to +75 degrees. A nonlinear dynamics model is constructed using the wind tunnel data to gain further insight into aircraft flight dynamics and controllability. Simulated perching trajectories optimized with a direct collocation method indicate that the forward-swept wing configuration can achieve qualitatively different lower-cost perching maneuvers than the straight wing configuration.

%B 55th AIAA Aerospace Sciences Meeting, AIAA SciTech Forum %G eng %0 Conference Paper %B Proceedings of the 55th Conference on Decision and Control (CDC) %D 2016 %T Derivative-Free Trajectory Optimization with Unscented Dynamic Programming %A Zachary Manchester %A Scott Kuindersma %XTrajectory optimization algorithms are a core technology behind many modern nonlinear control applications. However, with increasing system complexity, the computation of dynamics derivatives during optimization creates a computational bottleneck, particularly in second-order methods. In this paper, we present a modification of the classical Differential Dynamic Programming (DDP) algorithm that eliminates the computation of dynamics derivatives while maintaining similar convergence properties. Rather than relying on naive finite difference calculations, we propose a deterministic sampling scheme inspired by the Unscented Kalman Filter that propagates a quadratic approximation of the cost-to-go function through the nonlinear dynamics at each time step. Our algorithm takes larger steps than Iterative LQR---a DDP variant that approximates the cost-to-go Hessian using only first derivatives---while maintaining the same computational cost. We present results demonstrating its numerical performance in simulated balancing and aerobatic flight experiments.

Code: https://github.com/HarvardAgileRoboticsLab/unscented-dynamic-programming

%B Proceedings of the 55th Conference on Decision and Control (CDC) %G eng %0 Journal Article %J Journal of Field Robotics %D 2016 %T Director: A User Interface Designed for Robot Operation with Shared Autonomy %A Marion, Pat %A Fallon, Maurice %A Deits, Robin %A Valenzuela, Andrés %A Perez-D'Arpino, Claudia %A Greg Izatt %A Lucas Manuelli %A Antone, Matthew %A Dai, Hongkai %A Koolen, Twan %A John Carter %A Scott Kuindersma %A Russ Tedrake %XOperating a high degree of freedom mobile manipulator, such as a humanoid, in a field scenario requires constant situational awareness, capable perception modules, and effective mechanisms for interactive motion planning and control. A well-designed operator interface

presents the operator with enough context to quickly carry out a mission and the flexibility to handle unforeseen operating scenarios robustly. By contrast, an unintuitive user interface can increase the risk of catastrophic operator error by overwhelming the user with unnecessary information. With these principles in mind, we present the philosophy and design decisions behind Director---the open-source user interface developed by Team MIT to pilot the Atlas robot in the DARPA Robotics Challenge (DRC). At the heart of Director is an integrated task execution system that specifies sequences of actions needed to achieve a substantive task, such as drilling a wall or climbing a staircase. These task sequences, developed a priori, make online queries to automated perception and planning algorithms with outputs that can be reviewed by the operator and executed by our whole-body controller. Our use of Director at the DRC resulted in efficient high-level task operation while being fully competitive with approaches focusing on teleoperation by highly-trained operators. We discuss the primary interface elements that comprise the Director and provide analysis of its successful use at the DRC.

Publisher's link: http://onlinelibrary.wiley.com/doi/10.1002/rob.21681/full

The promise of legged robots over standard wheeled robots is to provide improved mobility over rough terrain. This promise builds on the decoupling between the environment and the main body of the robot that the presence of articulated legs allows, with two consequences. First, the motion of the main body of the robot can be made largely independent from the roughness of the terrain, within the kinematic limits of the legs: legs provide an active suspension system. Indeed, one of the most advanced hexapod robots of the 1980s was aptly called the Adaptive Suspension Vehicle. Second, this decoupling allows legs to temporarily leave their contact with the ground: isolated footholds on a discontinuous terrain can be overcome, allowing to visit places absolutely out of reach otherwise. Note that having feet firmly planted on the ground is not mandatory here: skating is an equally interesting option, although rarely approached so far in robotics.

Unfortunately, this promise comes at the cost of a hindering increase in complexity. It is only with the unveiling of the Honda P2 humanoid robot in 1996, and later of the Boston Dynamics BigDog quadruped robot in 2005 that legged robots finally began to deliver real-life capacities that are just beginning to match the long sought animal-like mobility over rough terrain. In fact, work in legged robotics has even contributed to the understanding of human and animal locomotion, as evidenced by the many fruitful collaborations between robotics and biomechanics researchers over legged locomotion.

%B Springer Handbook of Robotics, 2nd Ed %I Springer %G eng %0 Conference Paper %B Proceedings of the International Conference on Robotics and Automation (ICRA) %D 2016 %T Optimization and stabilization of trajectories for constrained dynamical systems %A Posa, Michael %A Scott Kuindersma %A Russ Tedrake %XContact constraints, such as those between a foot and the ground or a hand and an object, are inherent in many robotic tasks. These constraints define a manifold of feasible states; while well understood mathematically, they pose numerical challenges to many algorithms for planning and controlling whole-body dynamic motions. In this paper, we present an approach to the synthesis and stabilization of complex trajectories for both fully-actuated and underactuated robots subject to contact constraints. We introduce a trajectory optimization algorithm (DIRCON) that extends the direct collocation method, naturally incorporating manifold constraints to produce a nominal trajectory with third-order integration accuracy–-a critical feature for achieving reliable tracking control. We adapt the classical time-varying linear quadratic regulator to produce a local cost-to-go in the manifold tangent plane. Finally, we descend the cost-to-go using a quadratic program that incorporates unilateral friction and torque constraints. This approach is demonstrated on three complex walking and climbing locomotion examples in simulation.

%B Proceedings of the International Conference on Robotics and Automation (ICRA) %I IEEE %C Stockholm, Sweden %G eng %0 Journal Article %J Autonomous Robots %D 2016 %T Optimization-based locomotion planning, estimation, and control design for Atlas %A Scott Kuindersma %A Deits, Robin %A Fallon, Maurice %A Valenzuela, Andrés %A Dai, Hongkai %A Frank Permenter %A Koolen, Twan %A Marion, Pat %A Russ Tedrake %XThis paper describes a collection of optimization algorithms for achieving dynamic planning, control, and state estimation for a bipedal robot designed to operate reliably in complex environments. To make challenging locomotion tasks tractable, we describe several novel applications of convex, mixed-integer, and sparse nonlinear optimization to problems ranging from footstep placement to whole-body planning and control. We also present a state estimator formulation that, when combined with our walking controller, permits highly precise execution of extended walking plans over non-flat terrain. We describe our complete system integration and experiments carried out on Atlas, a full-size hydraulic humanoid robot built by Boston Dynamics, Inc.

%B Autonomous Robots %V 40 %P 429–455 %G eng %N 3 %R 10.1007/s10514-015-9479-3 %0 Journal Article %J Journal of Field Robotics %D 2015 %T An Architecture for Online Affordance-based Perception and Whole-body Planning %A Fallon, Maurice %A Scott Kuindersma %A Karumanchi, Sisir %A Antone, Matthew %A Schneider, Toby %A Dai, Hongkai %A Pérez D'Arpino, Claudia %A Deits, Robin %A DiCicco, Matt %A Fourie, Dehann %A Koolen, Twan %A Marion, Pat %A Posa, Michael %A Valenzuela, Andrés %A Yu, Kuan-Ting %A Shah, Julie %A Iagnemma, Karl %A Russ Tedrake %A Teller, Seth %XThe DARPA Robotics Challenge Trials held in December 2013 provided a landmark demonstration of dexterous mobile robots executing a variety of tasks aided by a remote human operator using only data from the robot's sensor suite transmitted over a constrained, field-realistic communications link. We describe the design considerations, architecture, implementation, and performance of the software that Team MIT developed to command and control an Atlas humanoid robot. Our design emphasized human interaction with an efficient motion planner, where operators expressed desired robot actions in terms of affordances fit using perception and manipulated in a custom user interface. We highlight several important lessons we learned while developing our system on a highly compressed schedule.

%B Journal of Field Robotics %V 32 %P 229–254 %G eng %N 2 %R 10.1002/rob.21546 %0 Conference Paper %B IEEE-RAS International Conference on Humanoid Robots %D 2015 %T A closed-form solution for real-time ZMP gait generation and feedback stabilization %A Russ Tedrake %A Scott Kuindersma %A Deits, Robin %A Kanako Miura %XHere we present a closed-form solution to the continuous time-varying linear quadratic regulator (LQR) problem for the zero-moment point (ZMP) tracking controller. This generalizes previous analytical solutions for gait generation by allowing ``soft" tracking (with a quadratic cost) of the desired ZMP, and by providing the feedback gains for the resulting time-varying optimal controller. This enables extremely fast computation, with the number of operations linear in the number of spline segments representing the desired ZMP. Results are presented using the Atlas humanoid robot where dynamic walking is achieved by recomputing the optimal controller online.

%B IEEE-RAS International Conference on Humanoid Robots %C Seoul, Korea %G eng %0 Conference Paper %B Proceedings of the International Conference on Robotics and Automation (ICRA) %D 2014 %T An Efficiently Solvable Quadratic Program for Stabilizing Dynamic Locomotion %A Scott Kuindersma %A Frank Permenter %A Russ Tedrake %XWe describe a whole-body dynamic walking controller implemented as a convex quadratic program. The controller solves an optimal control problem using an approximate value function derived from a simple walking model while respecting the dynamic, input, and contact constraints of the full robot dynamics. By exploiting sparsity and temporal structure in the optimization with a custom active-set algorithm, we surpass the performance of the best available off-the-shelf solvers and achieve 1kHz control rates for a 34-DOF humanoid. We describe applications to balancing and walking tasks using the simulated Atlas robot in the DARPA Virtual Robotics Challenge.

%B Proceedings of the International Conference on Robotics and Automation (ICRA) %I IEEE %C Hong Kong, China %P 2589–2594 %@ 978-1-4799-3685-4 %G eng %R 10.1109/ICRA.2014.6907230 %0 Conference Paper %B Robotics and Automation (ICRA), 2014 IEEE International Conference on %D 2014 %T A summary of team MIT's approach to the virtual robotics challenge %A Russ Tedrake %A Fallon, Maurice %A Karumanchi, Sisir %A Scott Kuindersma %A Antone, Matthew %A Schneider, Toby %A Howard, Tom %A Walter, M. %A Dai, H. %A Deits, R. %A Fleder, M. %A Fourie, D. %A Hammoud, R. %A Hemachandra, S. %A Ilardi, P. %A Perez-D'Arpino, C. %A Pillai, S %A Valenzuela, A. %A Cantu, C. %A Dolan, C. %A Evans, I. %A Jorgensen, S. %A Kristeller, J. %A Shah, J.A. %A Iagnemma, K. %A Teller, S. %XThe paper describes the system developed by researchers from MIT for the Defense Advanced Research Projects Agency's (DARPA) Virtual Robotics Challenge (VRC), held in June 2013. The VRC was the first competition in the DARPA Robotics Challenge (DRC), a program that aims to ``develop ground robotic capabilities to execute complex tasks in dangerous, degraded, human-engineered environments''. The VRC required teams to guide a model of Boston Dynamics' humanoid robot, Atlas, through driving, walking, and manipulation tasks in simulation. Team MIT's user interface, the Viewer, provided the operator with a unified representation of all available information. A 3D rendering of the robot depicted its most recently estimated body state with respect to the surrounding environment, represented by point clouds and texture-mapped meshes as sensed by on-board LIDAR and fused over time.

%B Robotics and Automation (ICRA), 2014 IEEE International Conference on %I IEEE %P 2087–2087 %@ 978-1-4799-3685-4 %G eng %R 10.1109/ICRA.2014.6907140 %0 Conference Paper %B Proceedings of the Sixteenth Yale Workshop on Adaptive and Learning Systems %D 2013 %T Robot Learning: Some Recent Examples %A George Konidaris %A Scott Kuindersma %A Scott Niekum %A Roderic Grupen %A Andrew Barto %XThis paper provides a brief overview of three recent contributions to robot learning developed by researchers at the University of Massachusetts Amherst. The first is the use of policy search algorithms that exploit new techniques in nonparameteric heteroscedastic regression to directly model policy-dependent distribution of cost. Experiments demonstrate dynamic stabilization of a mobile manipulator through learning flexible, risk-sensitive policies in very few trials. The second contribution is a novel method for robot learning from unstructured demonstrations that permits intelligent sequencing of primitives to create novel, adaptive behavior. This is demonstrated on a furniture assembly task using the PR2 mobile manipulator. The third contribution is a robot system that autonomously acquires skills through interaction with its environment.

%B Proceedings of the Sixteenth Yale Workshop on Adaptive and Learning Systems %P 71-76 %G eng %0 Journal Article %J International Journal of Robotics Research %D 2013 %T Variable Risk Control via Stochastic Optimization %A Scott Kuindersma %A Roderic Grupen %A Andrew Barto %XWe present new global and local policy search algorithms suitable for problems with policy-dependent cost variance (or risk), a property present in many robot control tasks. These algorithms exploit new techniques in nonparameteric heteroscedastic regression to directly model the policy-dependent distribution of cost. For local search, the learned cost model can be used as a critic for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to various risk-sensitive criteria. This separation of learning and policy selection permits variable risk control, where risk sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without relearning. We describe experiments in dynamic stabilization and manipulation with a mobile manipulator that demonstrate learning of flexible, risk-sensitive policies in very few trials.

%B International Journal of Robotics Research %V 32 %P 806–825 %G eng %N 7 %0 Journal Article %J The International Journal of Robotics Research %D 2012 %T Robot learning from demonstration by constructing skill trees %A George Konidaris %A Scott Kuindersma %A Roderic Grupen %A Andrew Barto %XWe describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved eciently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot-5 mobile manipulator.

%B The International Journal of Robotics Research %V 31 %P 360–375 %G eng %N 3 %0 Conference Paper %B RSS 2012 Workshop on Mobile Manipulation %D 2012 %T Variable Risk Dynamic Mobile Manipulation %A Scott Kuindersma %A Roderic Grupen %A Andrew Barto %XThe ability to operate effectively in a variety of contexts will be a critical attribute of deployed mobile manipulators. In general, a variety of properties, such as battery charge, workspace constraints, and the presence of dangerous obstacles, will determine the suitability of particular control policies. Some context changes will cause shifts in risk sensitivity, or tendency to seek or avoid policies with high performance variation. We describe a policy search algorithm designed to address the problem of variable risk control. We generalize the simple stochastic gradient descent update to the risk-sensitive case, and show that, under certain conditions, it leads to an unbiased estimate of the gradient of the risk-sensitive objective. We show that the local critic structure used in the update can be exploited to interweave offline and online search to select local greedy policies or quickly change risk sensitivity. We evaluate the algorithm in experiments with a dynamically stable mobile manipulator lifting a heavy liquid-filled bottle while balancing.

%B RSS 2012 Workshop on Mobile Manipulation %C Sydney, Australia %G eng %0 Conference Paper %B Robotics: Science and Systems VIII (RSS) %D 2012 %T Variational Bayesian Optimization for Runtime Risk-Sensitive Control %A Scott Kuindersma %A Roderic Grupen %A Andrew Barto %XWe present a new Bayesian policy search algorithm suitable for problems with policy-dependent cost variance, a property present in many robot control tasks. We extend recent work on variational heteroscedastic Gaussian processes to the optimization case to achieve efficient minimization of very noisy cost signals. In contrast to most policy search algorithms, our method explicitly models the cost variance in regions of low expected cost and permits runtime adjustment of risk sensitivity without relearning. Our experiments with artificial systems and a real mobile manipulator demonstrate that flexible risk-sensitive policies can be learned in very few trials.

%B Robotics: Science and Systems VIII (RSS) %C Sydney, Australia %P 201–206 %G eng %0 Conference Paper %B RSS 2011 Workshop on Mobile Manipulation: Learning to Manipulate %D 2011 %T Acquiring Transferrable Mobile Manipulation Skills %A Konidaris, George D %A Kuindersma, Scott R %A Grupen, Roderic A %A Barto, Andrew G %XThis abstract summarizes recent research on the autonomous acquisition of transferrable manipulation skills. We describe a robot system that learns to sequence a set of innate controllers to solve a task, and then extracts transferrable manipulation skills from the resulting solution. Using the extracted skills, the robot is able to significantly reduce the time required to discover the solution to a second task.

%B RSS 2011 Workshop on Mobile Manipulation: Learning to Manipulate %C Los Angeles, CA %G eng %0 Conference Paper %B Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11) %D 2011 %T Autonomous Skill Acquisition on a Mobile Manipulator %A George Konidaris %A Scott Kuindersma %A Roderic Grupen %A Andrew Barto %XWe describe a robot system that autonomously acquires skills through interaction with its environment. The robot learns to sequence the execution of a set of innate controllers to solve a task, extracts and retains components of that solution as portable skills, and then transfers those skills to reduce the time required to learn to solve a second task.

%B Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11) %C San Francisco, CA %P 1468–1473 %G eng %0 Conference Paper %B Proceedings of the ICML Workshop on New Developments in Imitation Learning %D 2011 %T CST: Constructing Skill Trees by Demonstration %A Konidaris, George D %A Kuindersma, Scott R %A Grupen, Roderic A %A Barto, Andrew G %XWe describe recent work on CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties per- mit skills to be improved eciently using a policy learning algorithm. Chains from mul- tiple demonstration trajectories are merged into a skill tree. We describe applications of CST to acquiring skills from human demon- stration in a dynamic continuous domain and from both expert demonstration and learned control sequences on a mobile manipulator.

%B Proceedings of the ICML Workshop on New Developments in Imitation Learning %C Bellevue, WA %G eng %0 Conference Paper %B Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots %D 2011 %T Learning Dynamic Arm Motions for Postural Recovery %A Scott Kuindersma %A Roderic Grupen %A Andrew Barto %XThe biomechanics community has recently made progress toward understanding the role of rapid arm movements in human stability recovery. However, comparatively little work has been done exploring this type of control in humanoid robots. We provide a summary of recent insights into the functional contributions of arm recovery motions in humans and experimentally demonstrate advantages of this behavior on a dynamically stable mobile manipulator. Using Bayesian optimization, the robot efficiently discovers policies that reduce total energy expenditure and recovery footprint, and increase ability to stabilize after large impacts.

%B Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots %C Bled, Slovenia %P 7–12 %G eng %0 Conference Paper %B Proceedings of the AAAI Conference on Artificial Intelligence %D 2010 %T Control Model Learning for Whole-Body Mobile Manipulation %A Scott Kuindersma %B Proceedings of the AAAI Conference on Artificial Intelligence %G eng %0 Conference Paper %B Advances in Neural Information Processing Systems 23 %D 2010 %T Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories %A George Konidaris %A Scott Kuindersma %A Andrew Barto %A Roderic Grupen %XWe introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction.

%B Advances in Neural Information Processing Systems 23 %P 1162–1170 %G eng %0 Conference Paper %B NIPS Workshop on Learning and Planning from Batch Time Series Data %D 2010 %T Learning from a Single Demonstration: Motion Planning with Skill Segmentation %A Scott Kuindersma %A George Konidaris %A Roderic Grupen %A Andrew Barto %XWe propose an approach to control learning from demonstration that first segments demonstration trajectories to identify subgoals, then uses model-based con- trol methods to sequentially reach these subgoals to solve the overall task. Using this approach, we show that a mobile robot is able to solve a combined navigation and manipulation task robustly after observing only a single successful trajectory.

%B NIPS Workshop on Learning and Planning from Batch Time Series Data %C Vancouver, BC %G eng