Next: Robust observations Up: The Vision Package Previous: Representing 3D Euclidean transformations   Contents

# Levenberg-Marquardt minimisation

      #include <gandalf/vision/lev_marq.h>

The Levenberg-Marquardt algorithm [#!Marquardt:JSIAM63!#,#!Bjorck:96!#] is a general non-linear downhill minimisation algorithm for the case when derivatives of the objective function are known. It dynamically mixes Gauss-Newton and gradient-descent iterations. We shall develop the L-M algorithm for a simple case in our notation, which is derived from Kalman filtering theory [#!BarShalom:Fortmann:88!#]. The Gandalf implementation of Levenberg-Marquardt will then be presented. Let the unknown parameters be represented by the vector , and let noisy measurements of be made:
 (5.7)

where is a measurement function and is zero-mean noise with covariance . Since we are describing an iterative minimization algorithm, we shall assume that we have already obtained an estimate of . Then the maximum likelihood solution for a new estimate minimizes
 (5.8)

We form a quadratic approximation to around , and minimize this approximation to to obtain a new estimate . In general we can write such a quadratic approximation as

for scalar , vectors , and matrices , . Note that here and in equation (5.8) the signs and factors of two are chosen WLOG to simplify the resulting expressions for the solution. Differentiating, we obtain

At the minimum point we have which means that
 (5.9)

Thus we need to obtain and to compute the update. We now consider the form of in (5.8). Writing the Jacobian of as

we have
 (5.10) (5.11)

In the last formula for , the terms involving the second derivatives of have been omitted. This is done because these terms are generally much smaller and can in practice be omitted, as well as because the second derivatives are more difficult and complex to compute than the first derivatives.

Now we solve the above equations for and given the values of function and the Jacobian evaluated at the previous estimate . We have immediately

We now write the innovation vectors as

Then we have
 (5.12)

Combining equations (5.9) and (5.12) we obtain the linear system
 (5.13)

to be solved for the adjustment . The covariance of the state is

The update (5.13) may be repeated, substituting the new as , and improving the estimate until convergence is achieved according to some criterion. Levenberg-Marquardt modifies this updating procedure by adding a value to the diagonal elements of the linear system matrix before inverting it to obtain the update. is reduced if the last iteration gave an improved estimate, i.e. if was reduced, and increased if increased, in which case the estimate of is reset to the estimate before the last iteration. It is this that allows the algorithm to smoothly switch between Gauss-Newton (small ) and gradient descent (large ).

This version is a generalization of Levenberg-Marquardt as it is normally presented (e.g. [#!Press:etal:88!#]) in that we incorporate vector measurements with covariances , rather than scalar measurements with variances. The full algorithm is as follows:

1. Start with a prior estimate of . Initialize to some starting value, e.g. 0.001.
2. Compute the updated estimate by solving the linear system (5.13) for the adjustment, having first added to each diagonal element of . Note that the Lagrange multiplier diagonal block should remain at zero.
3. Compute the least-squares residuals and from (5.8).
1. If , reduce by a specified factor (say 10), set to , and return to step 2.
2. Otherwise, the update failed to reduce the residual, so increase by a factor (say 10), forget the updated and return to step 2.
The algorithm continues until some pre-specified termination criterion has been met, such as a small change to the state vector, or a limit on the number of iterations.

If the measurements are unbiased and normally distributed, the residual is a random variable, and testing the value of against a distribution is a good way of checking that the measurement noise model is reasonable. The number of degrees of freedom (DOF) of the distribution can be determined as the total size of the measurement vectors, minus the size of the state. If the function returns the dimension of its vector argument, then the degrees of freedom may be computed as

 (5.14)

Subsections

Next: Robust observations Up: The Vision Package Previous: Representing 3D Euclidean transformations   Contents
2006-03-17