Variable Risk Control via Stochastic Optimization


S. Kuindersma, R. Grupen, and A. Barto, “Variable Risk Control via Stochastic Optimization,” International Journal of Robotics Research, vol. 32, no. 7, pp. 806–825, 2013.
vbo_ijrr.pdf5.5 MB
Variable Risk Control via Stochastic Optimization


We present new global and local policy search algorithms suitable for problems with policy-dependent cost variance (or risk), a property present in many robot control tasks. These algorithms exploit new techniques in nonparameteric heteroscedastic regression to directly model the policy-dependent distribution of cost. For local search, the learned cost model can be used as a critic for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to various risk-sensitive criteria. This separation of learning and policy selection permits variable risk control, where risk sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without relearning. We describe experiments in dynamic stabilization and manipulation with a mobile manipulator that demonstrate learning of flexible, risk-sensitive policies in very few trials.


Last updated on 07/14/2016