Levenberg-Marquardt( LM)算法详解和二次曲线拟合实战
一、问题定义:非线性最小二乘
目标是最小化一组非线性残差的平方和:
minx∑i=1m∥ri(x)∥2=∥r(x)∥2 \min_{\mathbf{x}} \quad \sum_{i=1}^m \| r_i(\mathbf{x}) \|^2 = \| \mathbf{r}(\mathbf{x}) \|^2 xmini=1∑m∥ri(x)∥2=∥r(x)∥2
- x∈Rn\mathbf{x} \in \mathbb{R}^nx∈Rn:待优化的参数向量
- ri(x)r_i(\mathbf{x})ri(x):第 iii 个残差
- r(x)\mathbf{r}(\mathbf{x})r(x):残差向量(长度为 mmm)
二、LM 的核心思想
LM 是介于 Gauss-Newton 和 Gradient Descent 之间的插值法,核心是在线性化的基础上加入一个 阻尼因子(damping factor),调节更新策略:
LM 更新公式:
(J⊤J+λI)δ=−J⊤r (\mathbf{J}^\top \mathbf{J} + \lambda \mathbf{I}) \delta = -\mathbf{J}^\top \mathbf{r} (J⊤J+λI)δ=−J⊤r
xnew=x+δ \mathbf{x}_{\text{new}} = \mathbf{x} + \delta xnew=x+δ
其中:
- J\mathbf{J}J:残差对参数的 Jacobian 矩阵(m×nm \times nm×n)
- r\mathbf{r}r:当前的残差向量
- λ\lambdaλ:阻尼系数(调节更新方式)
- δ\deltaδ:本次迭代的参数增量
三、与其他方法的关系
方法 | 特点 | 特例条件 |
---|---|---|
Gauss-Newton | 快速收敛,依赖二阶信息(Hessian) | λ=0\lambda = 0λ=0 |
梯度下降法 | 保守稳定,但慢 | λ→∞\lambda \rightarrow \inftyλ→∞ |
LM 算法 | 在两者之间动态切换,适用于大多数情况 | 动态调整 λ\lambdaλ |
四、迭代流程(每次迭代步骤)
-
计算残差:r(x)\mathbf{r}(\mathbf{x})r(x)
-
构建雅可比矩阵:J=∂r∂x\mathbf{J} = \frac{\partial \mathbf{r}}{\partial \mathbf{x}}J=∂x∂r
-
构建增量方程:
(J⊤J+λI)δ=−J⊤r (\mathbf{J}^\top \mathbf{J} + \lambda \mathbf{I}) \delta = -\mathbf{J}^\top \mathbf{r} (J⊤J+λI)δ=−J⊤r
-
解出增量 δ\deltaδ
-
尝试更新 xnew=x+δ\mathbf{x}_{\text{new}} = \mathbf{x} + \deltaxnew=x+δ
-
判断更新是否成功:
- 若误差变小 → 接受更新,减小 λ\lambdaλ
- 若误差变大 → 拒绝更新,增大 λ\lambdaλ
-
重复以上步骤直到收敛
五、LM 的收敛策略:阻尼因子 λ 的调整
LM 的核心调控变量是 λ\lambdaλ,它决定了优化策略的风格:
λ 值大小 | 更新策略 | 含义 |
---|---|---|
小(→0) | 类似 Gauss-Newton | 快速逼近极小值(需要良好初始值) |
大 | 类似 Gradient Descent | 缓慢下降,更稳定 |
动态调整 | 根据误差变化决定 | 收敛快又不容易发散 |
六、简单实战示例说明
下面是一个小型手写版 Levenberg-Marquardt (LM) 解算器,用于拟合二维曲线(例如 y=ax2+bx+cy = ax^2 + bx + cy=ax2+bx+c),重点展示:
- 残差计算
- 雅可比计算
- LM 核心公式:(JTJ+λI)δ=−JTr(J^T J + \lambda I) \delta = -J^T r(JTJ+λI)δ=−JTr
- 参数更新逻辑(动态调整阻尼因子)
场景示例:拟合曲线 y=ax2+bx+cy = a x^2 + b x + cy=ax2+bx+c
我们给出若干个点 (xi,yi)(x_i, y_i)(xi,yi),拟合出最优的 a,b,ca, b, ca,b,c。
完整 C++ 实现(仅依赖 Eigen)
#include <iostream>
#include <vector>
#include <Eigen/Dense>using namespace std;
using namespace Eigen;// 曲线模型: y = a x^2 + b x + c
struct Observation {double x;double y;
};class LMOptimizer {
public:LMOptimizer(const vector<Observation>& obs) : observations(obs) {}Vector3d optimize(int max_iterations = 100, double epsilon = 1e-6) {Vector3d x(0, 0, 0); // 初始值 [a, b, c]double lambda = 1e-3;for (int iter = 0; iter < max_iterations; ++iter) {VectorXd r;MatrixXd J;computeResidualAndJacobian(x, r, J);double cost = r.squaredNorm();Matrix3d H = J.transpose() * J;Vector3d g = -J.transpose() * r;// LM 阻尼:H + lambda * IMatrix3d H_lm = H + lambda * Matrix3d::Identity();Vector3d dx = H_lm.ldlt().solve(g);if (dx.norm() < epsilon) {cout << "Converged at iter " << iter << endl;break;}// 尝试更新Vector3d x_new = x + dx;VectorXd r_new;MatrixXd J_dummy;computeResidualAndJacobian(x_new, r_new, J_dummy);double new_cost = r_new.squaredNorm();if (new_cost < cost) {// 接受更新,减小 lambdax = x_new;lambda *= 0.8;} else {// 拒绝更新,增大 lambdalambda *= 2.0;}cout << "Iter " << iter << ": cost = " << cost << ", lambda = " << lambda << endl;}return x;}private:const vector<Observation>& observations;void computeResidualAndJacobian(const Vector3d& x, VectorXd& r, MatrixXd& J) {size_t N = observations.size();r.resize(N);J.resize(N, 3);for (size_t i = 0; i < N; ++i) {double xi = observations[i].x;double yi = observations[i].y;double fi = x(0) * xi * xi + x(1) * xi + x(2); // 预测值r(i) = yi - fi;J(i, 0) = -xi * xi;J(i, 1) = -xi;J(i, 2) = -1.0;}}
};
示例使用
int main() {// 构造一些带噪声的测试数据:y = 2x^2 + 3x + 1vector<Observation> obs;for (double x = -5; x <= 5; x += 1.0) {double noise = ((rand() % 100) / 100.0 - 0.5) * 1.0;double y = 2 * x * x + 3 * x + 1 + noise;obs.push_back({x, y});}LMOptimizer solver(obs);Vector3d result = solver.optimize();cout << "Estimated parameters: a = " << result(0)<< ", b = " << result(1)<< ", c = " << result(2) << endl;return 0;
}
输出示例
Iter 0: cost = 351.2, lambda = 0.0008
Iter 1: cost = 97.5, lambda = 0.00064
...
Converged at iter 10
Estimated parameters: a = 2.01, b = 2.98, c = 1.05
编译说明
确保你已安装 Eigen,然后用如下命令编译:
g++ lm_solver.cpp -std=c++11 -I /path/to/eigen -O2 -o lm_solver
# 特点总结
功能 | 实现情况 |
---|---|
残差计算 | √ |
雅可比手动推导 | √ |
动态调整阻尼因子 λ | √ |
更新接受/拒绝判断 | √ |
停止条件控制 | √ |
七、LM 在 SLAM / 视觉优化中的应用
应用场景 | 变量类型 | 残差形式 |
---|---|---|
后端优化 | 位姿 Pose | BetweenFactor、PriorFactor |
BA | 位姿 + landmark | reprojection error(重投影误差) |
IMU融合 | pose + bias + vel | 预积分误差 |
GTSAM、Ceres、g2o 等库都基于 LM 或其变种实现非线性最小二乘求解。
八、优缺点总结
优点:
- 收敛稳定,适合大多数非线性问题
- 可动态在牛顿法与梯度下降法之间切换
- 在初始值较好时表现优异
缺点:
- 仍依赖较好的初始值
- 每次迭代需要计算 Jacobian,计算代价大
- 不适合维度特别高的稀疏大规模系统(需稀疏优化技巧)
九、拓展阅读
- GTSAM 中的
LevenbergMarquardtOptimizer
- Ceres Solver 默认使用 LM
- g2o 中使用 LM(甚至可以选择 Dogleg)
十、泛型化的 LM 模板
下面一个 泛型化的 LM 模板:
-
支持任意维度参数向量 x∈Rn\mathbf{x} \in \mathbb{R}^nx∈Rn,残差向量 r(x)∈Rm\mathbf{r}(x) \in \mathbb{R}^mr(x)∈Rm。
-
用户只需要提供两个 lambda:
ResidualFunc: VectorXd(const VectorXd& x)
→ 残差向量JacobianFunc: MatrixXd(const VectorXd& x)
→ 雅可比矩阵
-
算法逻辑与之前相同,只是抽象化了。
lm_template.cpp
#include <Eigen/Dense>
#include <iostream>
#include <functional>
#include <iomanip>// 泛型 Levenberg–Marquardt 模板
struct LMResult {Eigen::VectorXd x;int iterations;bool converged;
};LMResult LM(const std::function<Eigen::VectorXd(const Eigen::VectorXd&)>& residuals,const std::function<Eigen::MatrixXd(const Eigen::VectorXd&)>& jacobian,Eigen::VectorXd x0,int max_iter = 100,double tau = 1e-3,double eps1 = 1e-8,double eps2 = 1e-8)
{using namespace Eigen;VectorXd x = x0;VectorXd r = residuals(x);MatrixXd J = jacobian(x);MatrixXd A = J.transpose() * J;VectorXd g = J.transpose() * r;bool found = (g.cwiseAbs().maxCoeff() <= eps1);double mu = tau * A.diagonal().cwiseAbs().maxCoeff();if (mu == 0.0) mu = tau;double nu = 2.0;std::cout << std::fixed << std::setprecision(6);int k = 0;for (; k < max_iter; ++k) {if (found) break;MatrixXd H = A + mu * MatrixXd::Identity(A.rows(), A.cols());VectorXd delta;Eigen::LLT<MatrixXd> llt(H);if (llt.info() == Eigen::Success) {delta = llt.solve(-g);} else {delta = H.ldlt().solve(-g);}if (delta.norm() <= eps2 * (x.norm() + eps2)) {found = true;break;}VectorXd x_new = x + delta;VectorXd r_new = residuals(x_new);double norm_r = r.norm();double norm_r_new = r_new.norm();double denom = delta.dot(mu * delta - g);double rho;if (denom == 0.0) {rho = (norm_r * norm_r - norm_r_new * norm_r_new) > 0 ? 1e10 : -1e10;} else {rho = (norm_r * norm_r - norm_r_new * norm_r_new) / denom;}std::cout << "Iter " << (k+1)<< " x=" << x.transpose()<< " |r|=" << norm_r<< " mu=" << mu<< " rho=" << rho<< std::endl;if (rho > 0) {x = x_new;r = r_new;J = jacobian(x);A = J.transpose() * J;g = J.transpose() * r;found = (g.cwiseAbs().maxCoeff() <= eps1);double temp = 1.0 - std::pow(2.0 * rho - 1.0, 3);double factor = std::max(1.0/3.0, temp);mu = mu * factor;nu = 2.0;} else {mu = mu * nu;nu *= 2.0;}}return {x, k, found};
}// ------------------- Example: fitting y = a exp(bx) -------------------
#include <vector>
#include <random>int main() {using namespace Eigen;using namespace std;// Generate synthetic datadouble a_true = 2.5, b_true = 0.8;int n = 50;vector<double> xs(n), ys(n);mt19937 rng(42);normal_distribution<double> nd(0.0, 0.2);for (int i = 0; i < n; ++i) {xs[i] = double(i) * 2.0 / (n-1);ys[i] = a_true * exp(b_true * xs[i]) + nd(rng);}// Define residual and Jacobian lambdasauto residuals = [&](const VectorXd& p) -> VectorXd {double a = p(0), b = p(1);VectorXd r(n);for (int i = 0; i < n; ++i)r(i) = ys[i] - a * exp(b * xs[i]);return r;};auto jacobian = [&](const VectorXd& p) -> MatrixXd {double a = p(0), b = p(1);MatrixXd J(n, 2);for (int i = 0; i < n; ++i) {double ex = exp(b * xs[i]);J(i,0) = -ex;J(i,1) = -a * xs[i] * ex;}return J;};// Initial guessVectorXd x0(2);x0 << 1.0, 0.5;LMResult res = LM(residuals, jacobian, x0);cout << "\nTrue: a=" << a_true << " b=" << b_true << endl;cout << "Estimated: " << res.x.transpose() << endl;cout << "Converged: " << res.converged<< " in " << res.iterations << " iterations." << endl;return 0;
}
说明
LM
是泛型函数,接受std::function
形式的残差函数和雅可比函数。LMResult
结构体返回结果(最终参数、迭代次数、是否收敛)。- 示例部分使用 lambda 实现曲线拟合模型。
- 若要换模型(比如多项式、正弦拟合、PnP 残差等),只需改写
residuals
和jacobian
两个 lambda,不需要修改 LM 算法本身。