The paper describes the system developed by researchers from MIT for the Defense Advanced Research Projects Agency's (DARPA) Virtual Robotics Challenge (VRC), held in June 2013. The VRC was the first competition in the DARPA Robotics Challenge (DRC), a program that aims to ``develop ground robotic capabilities to execute complex tasks in dangerous, degraded, human-engineered environments''. The VRC required teams to guide a model of Boston Dynamics' humanoid robot, Atlas, through driving, walking, and manipulation tasks in simulation. Team MIT's user interface, the Viewer, provided the operator with a unified representation of all available information. A 3D rendering of the robot depicted its most recently estimated body state with respect to the surrounding environment, represented by point clouds and texture-mapped meshes as sensed by on-board LIDAR and fused over time.
This paper provides a brief overview of three recent contributions to robot learning developed by researchers at the University of Massachusetts Amherst. The first is the use of policy search algorithms that exploit new techniques in nonparameteric heteroscedastic regression to directly model policy-dependent distribution of cost. Experiments demonstrate dynamic stabilization of a mobile manipulator through learning flexible, risk-sensitive policies in very few trials. The second contribution is a novel method for robot learning from unstructured demonstrations that permits intelligent sequencing of primitives to create novel, adaptive behavior. This is demonstrated on a furniture assembly task using the PR2 mobile manipulator. The third contribution is a robot system that autonomously acquires skills through interaction with its environment.
We present new global and local policy search algorithms suitable for problems with policy-dependent cost variance (or risk), a property present in many robot control tasks. These algorithms exploit new techniques in nonparameteric heteroscedastic regression to directly model the policy-dependent distribution of cost. For local search, the learned cost model can be used as a critic for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to various risk-sensitive criteria. This separation of learning and policy selection permits variable risk control, where risk sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without relearning. We describe experiments in dynamic stabilization and manipulation with a mobile manipulator that demonstrate learning of flexible, risk-sensitive policies in very few trials.
We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved eciently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot-5 mobile manipulator.
The ability to operate effectively in a variety of contexts will be a critical attribute of deployed mobile manipulators. In general, a variety of properties, such as battery charge, workspace constraints, and the presence of dangerous obstacles, will determine the suitability of particular control policies. Some context changes will cause shifts in risk sensitivity, or tendency to seek or avoid policies with high performance variation. We describe a policy search algorithm designed to address the problem of variable risk control. We generalize the simple stochastic gradient descent update to the risk-sensitive case, and show that, under certain conditions, it leads to an unbiased estimate of the gradient of the risk-sensitive objective. We show that the local critic structure used in the update can be exploited to interweave offline and online search to select local greedy policies or quickly change risk sensitivity. We evaluate the algorithm in experiments with a dynamically stable mobile manipulator lifting a heavy liquid-filled bottle while balancing.
We present a new Bayesian policy search algorithm suitable for problems with policy-dependent cost variance, a property present in many robot control tasks. We extend recent work on variational heteroscedastic Gaussian processes to the optimization case to achieve efficient minimization of very noisy cost signals. In contrast to most policy search algorithms, our method explicitly models the cost variance in regions of low expected cost and permits runtime adjustment of risk sensitivity without relearning. Our experiments with artificial systems and a real mobile manipulator demonstrate that flexible risk-sensitive policies can be learned in very few trials.
This abstract summarizes recent research on the autonomous acquisition of transferrable manipulation skills. We describe a robot system that learns to sequence a set of innate controllers to solve a task, and then extracts transferrable manipulation skills from the resulting solution. Using the extracted skills, the robot is able to significantly reduce the time required to discover the solution to a second task.
We describe a robot system that autonomously acquires skills through interaction with its environment. The robot learns to sequence the execution of a set of innate controllers to solve a task, extracts and retains components of that solution as portable skills, and then transfers those skills to reduce the time required to learn to solve a second task.
We describe recent work on CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties per- mit skills to be improved eciently using a policy learning algorithm. Chains from mul- tiple demonstration trajectories are merged into a skill tree. We describe applications of CST to acquiring skills from human demon- stration in a dynamic continuous domain and from both expert demonstration and learned control sequences on a mobile manipulator.
The biomechanics community has recently made progress toward understanding the role of rapid arm movements in human stability recovery. However, comparatively little work has been done exploring this type of control in humanoid robots. We provide a summary of recent insights into the functional contributions of arm recovery motions in humans and experimentally demonstrate advantages of this behavior on a dynamically stable mobile manipulator. Using Bayesian optimization, the robot efficiently discovers policies that reduce total energy expenditure and recovery footprint, and increase ability to stabilize after large impacts.
We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction.
We propose an approach to control learning from demonstration that first segments demonstration trajectories to identify subgoals, then uses model-based con- trol methods to sequentially reach these subgoals to solve the overall task. Using this approach, we show that a mobile robot is able to solve a combined navigation and manipulation task robustly after observing only a single successful trajectory.
Contact constraints arise naturally in many robot planning problems. In recent years, a variety of contact-implicit trajectory optimization algorithms have been developed that avoid the pitfalls of mode pre-specification by simultaneously optimizing state, input, and contact force trajectories. However, their reliance on first-order integrators leads to a linear tradeoff between optimization problem size and plan accuracy. To address this limitation, we propose a new family of trajectory optimization algorithms that leverage ideas from discrete variational mechanics to derive higher-order generalizations of the classic time-stepping method of Stewart and Trinkle. By using these dynamics formulations as constraints in direct trajectory optimization algorithms, it is possible to perform contact-implicit trajectory optimization with significantly higher accuracy. For concreteness, we derive a second-order method and evaluate it using several simulated rigid body systems, including an underactuated biped and a quadruped. In addition, we use this second-order method to plan locomotion trajectories for a complex quadrupedal microrobot. The planned trajectories are evaluated on the physical platform and result in a number of performance improvements.
We present HelioLinC, a novel approach to the minor planet linking problem. Our heliocentric transformation-and-propagation algorithm clusters tracklets at common epochs, allowing for the efficient identification of tracklets that represent the same minor planet. This algorithm scales as with the number of tracklets N, a significant advance over standard methods, which scale as . This overcomes one of the primary computational bottlenecks faced by current and future asteroid surveys. We apply our algorithm to the Minor Planet Center's Isolated Tracklet File, establishing orbits for more than 200,000 new minor planets. A detailed analysis of the influence of false detections on the efficiency of our approach, along with an examination of detection biases, will be presented in future work.
Parallelism can be used to significantly increase the throughput of computationally expensive algorithms. With the widespread adoption of parallel computing platforms such as GPUs, it is natural to consider whether these architectures can benefit robotics researchers interested in solving trajectory optimization problems online. Differential Dynamic Programming (DDP) algorithms have been shown to achieve some of the best timing performance in robotics tasks by making use of optimized dynamics methods and CPU multi-threading. This paper aims to analyze the benefits and tradeoffs of higher degrees of parallelization using a multiple-shooting variant of DDP implemented on a GPU. We describe our implementation strategy and present results demonstrating its performance compared to an equivalent multi-threaded CPU implementation using several benchmark control tasks. Our results suggest that GPU-based solvers can offer increased per-iteration computation time and faster convergence in some cases, but in general tradeoffs exist between convergence behavior and degree of parallelism.
Contact interactions are central to robot manipulation and locomotion behaviors. State estimation techniques that explicitly capture the dynamics of contact offer the potential to reduce estimation errors from unplanned contact events and improve closed-loop control performance. This is particularly true in highly dynamic situations where common simplifications like no-slip or quasi-static sliding are violated. Incorporating contact constraints requires care to address the numerical challenges associated with discontinuous dynamics, which make straightforward application of derivative-based techniques such as the Extended Kalman Filter impossible. In this paper, we derive an approximate maximum a posteriori estimator that can handle rigid body contact by explicitly imposing contact constraints in the observation update. We compare the performance of this estimator to an existing state-of-the-art Unscented Kalman Filter designed for estimation through contact and demonstrate the scalability of the approach by estimating the state of a 20-DOF bipedal robot in realtime.
Planning locomotion trajectories for legged microrobots is challenging. This is because of their complex morphology, high frequency passive dynamics, and discontinuous contact interactions with their environment. Consequently, such research is often driven by time-consuming experimental methods. As an alternative, we present a framework for systematically modeling, planning, and controlling legged microrobots. We develop a three- dimensional dynamic model of a 1.5 g quadrupedal microrobot with complexity (e.g., number of degrees of freedom) similar to larger-scale legged robots. We then adapt a recently developed variational contact-implicit trajectory optimization method to generate feasible whole-body locomotion plans for this microrobot, and demonstrate that these plans can be tracked with simple joint-space controllers. We plan and execute periodic gaits at multiple stride frequencies and on various surfaces. These gaits achieve high per-cycle velocities, including a maximum of 10.87 mm/cycle, which is 15% faster than previously measured for this microrobot. Furthermore, we plan and execute a vertical jump of 9.96 mm, which is 78% of the microrobot’s center-of- mass height. To the best of our knowledge, this is the first end-to-end demonstration of planning and tracking whole-body dynamic locomotion on a millimeter-scale legged microrobot.
Wearable robotic devices have been shown to substantially reduce the energy expenditure of human walking. However, response variance between participants for fixed control strategies can be high, leading to the hypothesis that individualized controllers could further improve walking economy. Recent studies on human-in-the-loop (HIL) control optimization have elucidated several practical challenges, such as long experimental protocols and low signal-to-noise ratios. Here, we used Bayesian optimization—an algorithm well suited to optimizing noisy performance signals with very limited data—to identify the peak and offset timing of hip extension assistance that minimizes the energy expenditure of walking with a textile-based wearable device. Optimal peak and offset timing were found over an average of 21.4 ± 1.0 min and reduced metabolic cost by 17.4 ± 3.2% compared with walking without the device (mean ± SEM), which represents an improvement of more than 60% on metabolic reduction compared with state-of-the-art devices that only assist hip extension. In addition, our results provide evidence for participant-specific metabolic distributions with respect to peak and offset timing and metabolic landscapes, lending support to the hypothesis that individualized control strategies can offer substantial benefits over fixed control strategies. These results also suggest that this method could have practical impact on improving the performance of wearable robotic devices.
Many critical robotics applications require robustness to disturbances arising from unplanned forces, state uncertainty, and model errors. Motion planning algorithms that explicitly reason about robustness require a coupling of trajectory optimization and feedback design, where the system's closed-loop response to disturbances is optimized. Due to the often-heavy computational demands of solving such problems, the practical application of robust trajectory optimization in robotics has so far been limited. Motivated by recent work on sums-of-squares verification methods for nonlinear systems, we derive a scalable robust trajectory optimization algorithm that optimizes approximate invariant funnels along the trajectory while planning. For the case of ellipsoidal disturbance sets and LQR feedback controllers, the state and input deviations along a nominal trajectory can be computed locally in closed form, permitting fast evaluation of robust cost and constraint functions and their derivatives. The resulting algorithm is a scalable extension of classical direct transcription that demonstrably improves tracking performance over non-robust formulations while incurring only a modest increase in computational cost. We evaluate the algorithm in several simulated robot control tasks.
The increasing capabilities of exoskeletons and powered prosthetics for walking assistance have paved the way for more sophisticated and individualized control strategies. In response to this opportunity, recent work on human-in-the-loop optimization has considered the problem of automatically tuning control parameters based on realtime physiological measurements. However, the common use of metabolic cost as a performance metric creates significant experimental challenges due to its long measurement times and low signal-to-noise ratio. We evaluate the use of Bayesian optimization—a family of sample-efficient, noise-tolerant, and global optimization methods—for quickly identifying near-optimal control parameters. To manage experimental complexity and provide comparisons against related work, we consider the task of minimizing metabolic cost by optimizing walking step frequencies in unaided human subjects. Compared to an existing approach based on gradient descent, Bayesian optimization identified a near-optimal step frequency with a faster time to convergence (12 minutes, p < 0.01), smaller inter-subject variability in convergence time (± 2 minutes, p < 0.01), and lower overall energy expenditure (p < 0.01).
Contact constraints arise naturally in many robot planning problems. In recent years, a variety of contact-implicit trajectory optimization algorithms have been developed that avoid the pitfalls of mode pre-specification by simultaneously optimizing state, input, and contact force trajectories. However, their reliance on first-order integrators leads to a linear tradeoff between optimization problem size and plan accuracy. To address this limitation, we propose a new family of trajectory optimization algorithms that leverage ideas from discrete variational mechanics to derive higher-order generalizations of the classic time-stepping method of Stewart and Trinkle. By using these dynamics formulations as constraints in direct trajectory optimization algorithms, it is possible to perform contact-implicit trajectory optimization with significantly higher accuracy. For concreteness, we derive a second-order method and evaluate it using several simulated rigid body systems including an underactuated biped and a quadruped.
Differential Dynamic Programming (DDP) has become a popular approach to performing trajectory optimization for complex, underactuated robots. However, DDP presents two practical challenges. First, the evaluation of dynamics derivatives during optimization creates a computational bottleneck, particularly in implementations that capture second-order dynamic effects. Second, constraints on the states (e.g., boundary conditions, collision constraints, etc.) require additional care since the state trajectory is implicitly defined from the inputs and dynamics. This paper addresses both of these problems by building on recent work on Unscented Dynamic Programming (UDP)---which eliminates dynamics derivative computations in DDP---to support general nonlinear state and input constraints using an augmented Lagrangian. The resulting algorithm has the same computational cost as first-order penalty-based DDP variants, but can achieve high-accuracy constraint satisfaction without the numerical ill-conditioning associated with penalty methods. We present results demonstrating its favorable performance on several simulated dynamical systems including a quadrotor and 7-DoF robot arm.