 In the modern science of data analytics, sometimes oldies are goodies. I once took an optimization class where the answer to every question posed by the professor was “the Taylor series,” referring to a popular numerical method that will be 300 years old next year. Brook Taylor’s 1715 formulation, which can be traced back even further to James Gregory in the seventeenth century, is the foundation of a great many of today’s numerical methods, of which one of the most powerful is nonlinear batch least squares.

Depending on your perspective the Taylor series can be both mundane and profound. Its basic idea is that the value of a function, say y = x 2 , at a particular point, say x = 3.1, can be computed as the value at some other point, say x = 3, with an adjustment to account for the difference. The slope of y = x 2 at x = 3 is 6, so between x = 3 and x = 3.1, the function will increase about 6 × 0.1, or 0.6. So if y is 9 at x = 3 (32 is 9), then y is about 9 + 0.6, or 9.6, at x = 3.1. My calculator tells me the exact value is 9.61.

If you draw this out with pencil and paper you will see the idea is quite simple. If you know the slope, or derivative, of a function, then you can approximate nearby values of the function. I once knew a fellow who could tell you the value of functions, such as the square root or cosine, at arbitrary values faster than you could punch it into a calculator.

But the Taylor series is a bit more profound when you consider higher order derivatives. When we used the first derivative, or slope, of y = x 2 above, we could approximate nearby values fairly accurately. But if we also account for the second derivative—the slope of the slope—we can have the answer exactly.

In fact we could compute the exact value of y = x 2 at any point, knowing only the function value at any given point, and its first and second derivative. Simply put, what the Taylor series tells us is that for well-behaved functions such as y = x 2 , the entire function can be described in terms of information at only a single point in the function.

In practice there are many applications in which there are multiple output and input variables (y1, y2, …, x1, x2, …) and the function derivatives cannot be analytically derived. In such applications the Taylor series can nonetheless be useful.

Consider the problem of modeling an aircraft flight using radar tracking data. If you compute the trajectory using flight segments such as climbs, descents and turns, then you can derive the radar measurements. The difference between your derived measurements and the actual measurements indicates your modeling error. You may have errors in the start and stop times of the segments, or in the aircraft performance, such as the rate of climb, during the segments.

Using the Taylor series idea you can perturb each of the modeling parameters to estimate the slope of each of the output variables (the errors in your derived radar measurements) with respect to each of the input variables (the flight parameters).

You can then adjust the flight parameters in the direction that reduces the radar measurement errors, until those errors are minimized. This method was suggested by Donald Marquardt in 1963 and traces back to Kenneth Levenberg in 1944. There is no guarantee of success and you must check your answer carefully, but the Marquardt-Levenberg algorithm is robust and has been successfully applied to a wide range of problems. In the aircraft trajectory example, the result is a description of the flight, not in terms of a long list of radar measurements, but rather a short list of meaningful performance parameters. These can then be used, for example, to predict future trajectories.

Categories: Blogs

#### Mark L. Stone · December 6, 2016 at 8:08 pm

There are much better, more robust algorithms available to solve nonlinear least squares than Marquardt-Levenberg – this is not 1963. Among other crimes, as in the stories below, Marquardt-Levenberg ignores higher order terms (in this case, using an approximation for the Hessian of the objective function which ignores higher order terms in the Hessian which can be crucial if the residuals at the optimum are not small, and can cause degraded performance far from the optimum, possibly resulting in non-convergence), often to great peril. And of course, the usual Marquardt-Levenberg estimate for the covariance of the estimated parameters is not only based on the potentially very inaccurate linearization inherent in use of the inverse of the Hessian, but is further compounded by the approximation used by Marquardt-Levenberg in calculating the Hessian. Instead, computationally intensive bootstrapping can be used to estimate the uncertainty distribution of the estimated parameters and the model’s predictions. Also, you really need to think about global optimization, recognizing that if the model is non-convex, as most nonlinear least squares problems are, Marquardt-Levenberg and other local optimization algorithms may find a local optimum which is not globally optimal. And for global optimization, I’m not talking about genetic algorithms and other heuristic junk often falsely advertised as being global optimization algorithms or just randomly trying several starting values with a local optimization algorithm.
Two funny Taylor series stories. Lightning does indeed strike twice.