Background
Exploration seismology involves applying seismic imaging and inversion techniques to build physical property models for the Earth and to delineate subsurface features. Seismic FullWaveform Inversion (FWI) is concerned with iteratively improving on a prexisting model of the Earth, in order to better fit seismic measurements recorded in the field. Field measurements (data) are digital recordings of pressure (with hydrophones) or particle velocity (with geophones) at a number of different locations (receivers). In exploration seismology, source enegry typically comes from vibroseis and/or dynamite (onland), or airguns (offshore). One trace exists for each pair of source and receiver.
Forward modelling
Seismic data are expressions of signals being propagated as waves, and wave equations describe (simplified) physics that model the behaviour of these waves. The constantdensity 3D acoustic wave equation takes the form:
where is the pressure wavefield arising from a source distribution in a model described by the propagation velocity . All three quantities are scalar fields that may vary in all three spatial dimensions ; both and also vary with time . In many cases, it may be convenient to work in the temporal frequency domain, which requires taking the Fourier transform of the wave equation above. The resulting 3D acoustic Helmholtz equation,
describes the behaviour of a timeharmonic wavefield as a function of the angular frequency , with temporal frequency .
Analytical case
In the case of a constant velocity for all locations , the analytical response of this system is straightforwardly written. We may simplify by choosing to make a point source, in which case
In response to such a point source at location , the wavefield evaluated at a receiver location is described by the Green's function , with
in the case of the 3D acoustic Helmholtz equation.
Discrete case
The system of equations above can be straightforwardly discretized in the form
where is the system matrix representing the physics of the problem, is the field vector (discretized ) and is the RHS source vector (discretized ). The vector represents discretized heterogenous model of velocity , upon which is nonlinearly dependent. The task of solving system of equations for the field vector requires inverting the matrix , such that .
Data synthesis
Once we are equipped with a scheme to form the system of equations and invert it to solve for the field vector , we can go about the task of computing synthetic data to simulate the response of the field experiment. We describe a data restriction operator , which maps from the complex 3D wavefield to the responses at a series of receiver locations. The action of this operator is a good conceptual model to understand how the observed data arise from the field experiment. The equivalent matrix extracts a 1D vector of synthetic data from the field vector ,
Inversion
In an ideal world, the synthetic data would match the observed data . This perfect correspondence between the experimental results and the results of the computer model would indicate that the earth model is correct (within the resolving ability of the experiment).^{1} In order to quantify the errors in the model , we define an objective function,
which measures the error between the data arising from the field experiment () and the data arising from the numerical simulation () under the norm. Then, we define an inverse problem
with the purpose of finding the optimal model . If we were in possession of infinite computer resources, we could employ a global search over all feasible models to find the best to fit the data^{2}; however, in practice the solution of the forward problem is quite costly, and we must employ a more efficient method for finding (something close to) .
Local optimization
Rather than search over all possible , we can take advantage of the fact that we often have a good estimate of to begin with.^{3} In this case, we might consider implementing a local optimization method. The simplest example is the steepest descent or gradient algorigthm, which uses a firstorder approximation to the rate of change of the data misfit with respect to each model parameter. The gradient is
where is the Jacobian matrix of partial derivatives evaluated at the present model; indicates transposition and complex conjugation. The steepest descent algorithm is an iterative process, in which the model iterate is updated in an attempt to go "downhill" in the space of the objective function .
For appropriate choices of the scalar steplength at each iteration, will converge towards a model that finds a reduced value of .^{4} Various schemes exist to improve on the steepest descent algorithm, including projection methods, conjugate gradient approaches, and higherorder methods including Newton, GaussNewton and quasiNewton (lBFGS) schemes.
Model regularization
In the introduction to the Inversion section, we define the inverse problem that involves finding to minimize the misfit function . However, we put no other constraints on the solution. In many practical cases, we know additional information about the sorts of models that can exist, or are most feasible. Therefore, it may make sense to solve a different problem, such as
which additionally requires that sits inside some (convex) constraint set . This set could include all models that have velocities within a feasible range, or all models that are sufficiently smooth, etc. All of these criteria represent types of model regularization.
In order to find models that fit this new problem, there are various approaches that may be used:
 Projection methods
 Define a projector such that .^{5} Then, the projector is applied to the candidate model terates to ensure that continues to satisfy , e.g.,
 Penalty methods
 Redefine the objective function with and chosen to penalize certain model properties. Typical choices might be for to be a discrete representation of (i.e., first spatial derivative) and to be diagonal by construction.

Of course, in practice , where is some distribution of noise in the recordings, so a perfect match between and would be quite suspicious. ↩

It is also important to point out that, in general, there is no single with minimum misfit as measured under . Rather, there are usually an infinite number of models that can give rise to a particular misfit . This is an expression of nonuniqueness, and is one of the motivations for properly implementing Model regularization. ↩

In practice, this estimate typically comes from from another geophysical method (e.g., traveltime inversion), in combination with a priori knowledge of geology and rock physics information. ↩

NB: This is true as long as the initial model iterate is not a stationary point, in which case is zero. ↩

A projector is a linear operator that is idempotent, meaning . ↩