It is sometimes convenient to write the individual processes as $Y_1(\cdot)$ and $Y_2(\cdot)$, respectively. Then the joint probability measure of $[Y_1(\cdot), Y_2(\cdot)]$ can be written as, \begin{equation}\label{eq:decomp} [Y_1(\cdot), Y_2(\cdot)] = [Y_2(\cdot) | Y_1(\cdot)][Y_1(\cdot)], \end{equation} where the notation $[A|B]$ is used to represent the conditional probability of $A$ given $B$, and $[B]$ represents the marginal probability of $B$. The conditional probability in (1) is shorthand for $$ [{Y_2(\svec): s \in D}\mid {Y_1(\vvec) : \vvec \in D}].$$ The book by Banerjee et al. (2015, p. 273) states that it is meaningless to talk about the joint distribution of $Y_2(\svec_1) \mid Y_1(\svec_1)$ and of $Y_2(\svec_2) | Y_1(\svec_2)$ as building blocks for the conditional approach, with which we agree. It also goes on to say that this "reveals the impossibility of conditioning," with which we disagree and explain why below.
It is not just one or a few finite-dimensional distributions that define the conditional approach, it is all of them. Further, these finite-dimensional distributions are for the hidden processes $Y_1(\cdot)$ and $Y_2(\cdot)$ and not for noisy incomplete data that we notate as $Z_1 = \{Z_1(\svec_{1,i} )\}$ and $Z_2 = \{Z_2(\svec_{2,i} )\}$. Banerjee et al. (2015, p. 273) go on to state that the conditional approach is flawed and that kriging is not possible. Cressie and Zammit-Mangion (2016) show that the conditional approach defined through (1) yields a well defined bivariate (Gaussian) process $(Y_1(\cdot), Y_2(\cdot))$, and they perform kriging on $Y_1(\cdot)$ based on the data $Z_1$ and $Z_2$.
The construction of a bivariate spatial covariance function follows directly from specification of the conditional mean and conditional covariance functions, as follows:
$$ \begin{align}\label{eqn:E-and-cov} \E\left(Y_2(\s)\mid Y_1(\cdot)\right)&\,=\int_D{b(\s,\v)Y_1(\v)\,\d \v};\quad \s\in D,\\ \cov\left(Y_2(\s),Y_2(\u)\mid Y_1(\cdot)\right)&\,=C_{2\mid 1}(\s,\u);\quad \s,\u\in \mathbb{R}^d.\nonumber \end{align} $$
These specifications are enough to establish the Kolmogorov consistency conditions and hence the existence of the bivariate spatial process referred to above. Further, the cross-covariance functions are easy to derive, and their formula in the $p=2$ case is as follows (Cressie and Zammit-Mangion, 2016):
$$ \begin{align} C_{12}(\s,\u)=\cov\left(Y_1(\s),Y_2(\u)\right)=&\,\cov\left(Y_1(\s),E(Y_2(\u)\mid Y_1(\cdot))\right)\nonumber\\ =&\,\int_D{C_{11}(\s,\w)b(\u,\w)\,\d \w};\quad\s,\u\in D.\label{eqn:cov2} \end{align} $$
Furthermore, the conditional approach can be modified easily for processes indexed on different spatial domains: $\{Y_1(\svec): \svec \in D_1\}$ and $\{Y_2(\svec): \svec \in D_2\}\hspace{-0.03in}$, for $D_1, D_2 \in \mathbb{R}^d\hspace{-0.03in}$; see Cressie and Zammit-Mangion (2016).
A temperature-pressure dataset was used in Gneiting et al. (2010) and Apanasovich et al. (2012) to fit symmetric bivariate models. The flexibility of the conditional approach can be demonstrated on this dataset, and it seen that there is a pronounced asymmetry in the cross-covariance structure that is picked up by the conditional approach. The direction of the dependence is suggested on physical grounds: Higher temperatures cause air to rise and hence pressure to decrease. The data are available from the R package, RandomFields (Schlather et al., 2015), and they refer to “error fields,” namely the respective differences between 2-day forecasts and observations from monitoring stations in the Pacific Northwest of North America on December 18, 2003 at 4 p.m.
Figure 2: Left panels: The cokriged surface using maximum likelihood parameter estimates with an asymmetric model constructed using the conditional approach, for the temperature and pressure error fields. Top-right panel: A scatter plot of the kriging standard errors of $Y_1$ (temperature) obtained with the asymmetric model against those obtained when assuming $Y_1$ and $Y_2$ are two independent Matérn fields (the independent Matérn model), at each of the mesh vertices. The colour illustrates the difference between the two, with green denoting the higher standard error of the asymmetric model and purple the higher standard error of the independent Matérn model. Bottom-right panel: A spatial plot of the difference in the kriging standard errors of $Y_1$ (temperature) obtained with the asymmetric model and the independent Matérn model, with green denoting a higher standard error of the asymmetric model and purple a higher standard error of the independent Matérn model. The red dots denote the station locations.