Suppose that you have N data points, yk, for k = 1,2,...,N,
and the function to be fitted is f(x,p), where p represents the M
parameters <p1,p2,...,pM>. Define the likelihood of
the parameters, given the data, as the probability of the data, given the parameters. We fit for the
parameters, p, by finding those values, pmin that
maximize this likelihood. This form of parameter estimation is known as maximum likelihood estimation.
Good references on this topic include:
Consider the likelihood function
,
where P is the
probability density, which depends on the random variable x and the parameters
(p)≡P(x1,p)*P(x2,p)*...*P(xN,p)p.
is a measure for the
probability to observe just the particular sample we have, and is called
an a-posteriori probability since it is computed after the sampling
is done. The best estimates for p are the values which
maximize
. But maximizing the logarithm of
also maximizes
, and
maximizing ln( is equivalent to minimizing
)-ln(. So, the goal becomes minimizing the log likelihood function:
)

Let p0 be the initial values given for p.
The goal is to find a ∇p so that p1 = p0+∇p
is a better approximation to the data. We use the iterative Gauss-Newton method, and the series
p1,p2,p3,... will hopefully converge to the minimum,
pmin.
Generally, the Gauss-Newton method is locally convergent when χ2 is
zero at the minimum. Serious difficulties arise when f is sufficiently
nonlinear and χ2 is large at the minimum. The Gauss-Newton method has the
advantage that linear least squares problems are solved in one iteration.
Consider the Taylor expansion of L(p):

Define the arrays
,
and
:

If we linearize, i.e., assume that




Note: The partial derivatives are approximated numerically using a central difference approximation:
