Estimating cross-covariance models for cokriging

Available with Geostatistical Analyst license.

When you have multiple datasets and you want to use cokriging, you need to develop models for cross-covariance. Because you have multiple datasets, you keep track of the variables with subscripts, with Z_k(s _j) indicating a random variable for the k^th data type at location s _i. The cross-covariance function between the k^th data type and the m^th data type is then defined to be

\(C \space _{ \textit{km} } \space ( \textbf{s} _{i} , \textbf{s} _{j} ) \space = \space cov(Z _{k} ( \textbf{s} _{i} ), \space Z _{m} ( \textbf{s} _{j} ))\)

Here is a subtle and often confusing fact: C _km (s _i , s _j) can be asymmetric: C _km (s _i , s _j) ≠ C _mk (s _i , s _j) (notice the switch in the subscripts). To see why, look at the following example. Suppose you have data arranged in one dimension, along a line, such as the following:

Cross-covariance

The variables for type 1 and 2 are regularly spaced along the line, with the thick red line indicating highest cross-covariance, the green line less cross-covariance, and the thin blue line the least cross-covariance, with no line indicating 0 cross-covariance. This figure shows that Z₁(s _i) and Z₂(s _j) have the highest cross-covariance when s _i = s _j, and the cross-covariance decreases as s _i and s _j get farther apart. In this example, C _km (s _i , s _j ) = C _mk (s _i , s _j ). However, the cross-covariance can be "shifted":

Cross-covariance

Notice that C₁₂(s ₂, s ₃) now has the minimum cross-covariance (thin blue line) while C₂₁(s ₂, s ₃) has the maximum cross-covariance (thick red line), so here C_km (s _i , s _j) ≠ C _mk (s _i , s _j). Relative to Z₁, the cross-covariances of Z₂ have been shifted -1 unit. In two dimensions, Geostatistical Analyst will estimate any shift in the cross-covariance between the two datasets if you click the shift parameters.

The empirical cross-covariances are computed as follows:

Average [ (z₁(s_i) - Z-bar ₁) (z₂(s_j) - Z-bar ₂)]

where Z_k(s _i) is the measured value for the k^th data set at location s _i , Z-bar _k is the mean for the k^th dataset, and the average is taken for all s _i and s _j separated by a certain distance and angle. As for the semivariograms, Geostatistical Analyst shows both the empirical and fitted models for cross-covariance.

Choosing different cross-covariance models, using compound cross-covariance models, and choosing anisotropy will all cause the theoretical model to change. You can make a preliminary choice of model by seeing how well it fits the empirical values. Changing the lag size and the number of lags and adding shifts will change the empirical cross-covariance surface, which will cause a corresponding change in the theoretical model. Geostatistical Analyst computes default values, but you should feel free to try different values and use validation and cross-validation to choose the best model.