Qu represents the gradient of a DDP interval given an updated Hessian of the Value function Vxx_p. If Vxx_p provides a good estimation of the problem, then we can assume that Qu is the gradient of our OC problem.
Instead Qu.T * Quu^-1 * Qu represents the progression of the optimization. In fact, this is used as Armijo rule.
General speaking, we could use the norm of Qu as stopping criteria. However, I see cases where this value is very small (around 10^-5) and still there is a significant improvement, i.e. value of Qu.T * Quu^-1 * Qu (around 10^-3).
We could have use both values as stopping criteria, @nmansard do you think this is a good idea? or just make complex our code.
I included an simple test for stopping criteria in the LQR case (!27 (merged)). Basically, I build and solve a KKT problem for LQR problem. Then I checked that the gradient of the Lagrangian is equals zero.
Note that this test only works for linear system with quadratic cost functions.
We have the jacobian of the constraint to be (with explicit x0 variable and constraint x0=x0ref):
J = [ I 0 ]
[ -fx I -fu ]
[ -fx I -fu ]
[ ... ... ]
The transpose multiplies one lambda per timestep from 0 to T, and produces one "equality" per x and one per u. Looking only at the part of J^T corresponding to x, we have one main diagonal of I and a subdiagonal of -fx. Hence:
lambda_T = - lx_T
and recursively
lambda_t - fx^T lambda_t+1 = - lx_t+1
This recurence is just the same as the recurence on the Vx, so lambda_t = Vx_t.