Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
\documentclass[a4paper,10pt]{article}
\usepackage{ucs}
\usepackage[utf8]{inputenc}
%\usepackage{babel}
\usepackage{fontenc}
\usepackage{graphicx}
\usepackage{rotating}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage[dvips]{hyperref}
\date{06/05/2014}
\begin{document}
\title{gmGeostats: an R package for massive, parallelized geostatistical
analysis}
\maketitle
\section*{Content}
\paragraph{abstract}
A multivariate geostatistics R package is desired, that is able to deal
with large data sets, massive interpolation/simulation grids
and flexible variogram/spatial dependence structures, taking
all the profit possible of parallelization capabilities.
Apart of containing state of the art methods, it should be the framework
to develop, test and disseminate our own geostatistical methods.
It will be highly compatible/interfaced with our other packages (systemstats,
compositions, gmDatabase and gmGeometallurgy), and specially adapted
to the needs of modern and future geometallurgical analysis,
regional geochemistry, and other main research topics at HIF.
Eventually, should have graphical interfaces with QGIS, Rcmdr/RKWard,
self-standing GUI
(as an extra package?)
\begin{sidewaystable}
\begin{tabular}{lllll}
\hline
model & feature & structural tool & interpolation & simulation \\
\hline
Gaussian & continuous & covariance & Simple coKriging & Turning Bands (TB) \\
& & covariogram & Ordinary coKriging & TB + ApproxChol \\
& & residual covariance & Universal coKriging & TB + several \\
& & MAF and rank-& & \\
& & defficient covs & ScoK, OcoK, UcoK & TB \\
\hline
transGaussian & continuous & covariance & ScoK & Turning Bands (TB) \\
\hline
transition probs & discrete & transiograms & & sequential indicator \\
indicator & & indicator covariance & IScoK & \, cosimulation \\
& & & & plurigaussian sim. \\
\hline
multipoint & discrete & training image & E-type from & MPS \\
& any & unsaturated GLM & \, simulations & MPS \\
\hline
\end{tabular}
\end{sidewaystable}
\section{Data input, structure and spatial manipulation}
Typical geostatistical software manage data sets with 2D and 3D spatial
coordinates. In most of their applications, one data set is used for all steps
of the geostatistical analysis. Most of the times, each variable is considered
separated, even though some of them can be observed at the same locations. Two
exceptions are notable. The first is MPS, where a training image must also be
provided, i.e. as a sort of ``training set''. The second is collocated cokriging
(a misnome), where a secondary variable densely sampled is used to interpolate a
less measured one.
On the other hand, recent work of this group has shown how important is to
consider that it is not always appropriate to consider the data as cartesian
products of separate, univariate variables. Some sets of variables form natural
groups, that should be analysed and interpreted together (parts in a
composition, mass fractions in a particle size distribution, disjunctive
indicators of facies, dip and strike of a direction, summaries of MLA images,
etc). These sets of jointly defined variables are called here \emph{layers}.
In this framework, we consider necessary to manage data of the following
characteristics:
\begin{itemize}
\item a project might contain several layers of information, and should be
georeferenced (1D to 4D geographic space: borehole, plane, volume,
volume$\times$ time);
\item each \textbf{layer} can contain any number of collocated variables
forming a system (a composition, a direction, a particle size distribution, a
facies, etc): though missings should be allowed, in principle a layer should be
(almost) fully observed;
\item each layer might indicate the need of a \textbf{scaling function}
(univariate or multi\-variate) and give its direct and inverse transformation
function (analytical function, like an alr/ilr; Hermite polynomials for Gaussian
anamorphosis; flow specification for multivariate Gaussian anamorphosis;
isofactorial model?)
\item it is necessary to distinguish between ``\textbf{conditioning set}'' (the
data set that will provide the conditioning points in interpolation/simulation)
and the ``\textbf{training set}'' (the data set that will provide the spatial
structure information, e.g. the training image in MPS); note that, up to
irruption of MPS, most geostatistical practice did not distinguish them;
\item training sets might have relative geographic coordinates, but the scale
and projection should be compatible with that of the data;
\item for the moment, the range of foreseen applications suggest that it should
be possible that each layer has its training set, or that each project has a
unique multi-layer training set.
\end{itemize}
Further aspects of spatial data storage and manipulation that are often
required, typical of GIS and similar software, are:
\begin{description}
\item[Domaining] the geographic space can be partitioned in disjoint domains;
contacts between domains might be ``soft'' (information can go throught them,
data in both domains might contribute to interpolations-simulations) or ``hard''
(information should not go through the contact, interpolations-simulations
should only use data on one side of the contact). Domains are usually large
bodies, not necessarily continuous. These might be identified in several
alternative ways: a wireframe surface (vector paradigm) or an facies-like
indicator (raster paradigm).
\item[Blocking] the geographic space can be partitioned in disjoint blocks for
which some averaged quantity is desired. These are mostly small, regular,
continuous bodies, hence more easily identified as the interior of a grid.
Blocks must often be discretized in a moderate number of internal
points-segments-pixels-voxels.
\end{description}
Other spatial manipulations: rotation, erosion, dillation, global geometric
anistotropy compensation.
\section{Structural tool estimation}
Up to now, most geostatistical sofwtare can manage only one type of structural
tool: (cross-)variograms. MPS software is, of course, prepared to manage
training images. However, the requirements on the kinds of data to manipulate
and the new methods that we want to implement set these as too limiting. The
following kinds of pairs [training data/structural function] should be
available.
\begin{itemize}
\item the data used should be the training data;
\item all layers enter the calculations after being applied the scaling
function;
\item it should be possible to use a domaining to restrict the pairs of
locations $(x_i,x_j)$ to enter the calculations only if they are in the same
domain, or restrict the calculations to one single subdomain.
\end{itemize}
\paragraph{[real data/scalar-variogram]} The basic tool of classical two-point
geostats. The user should be able to select which variable treat $Z$, define a
polar grid of lag distances $h$ to search for pairs of data, and estimate the
variogram
$$
\hat{\gamma}_Z(h) = \frac{1}{2N(h)} \sum_{i,j}^{N(h)} (z(x_i)-z(x_j))^2.
$$
\paragraph{[real data/scalar-covariance]} Also a common tool of classical
two-point geostats. The user should be able to select which variable treat $Z$,
give its mean value $\mu$, define a polar grid of lag distances $h$ to search
for pairs of data, and estimate the covariance function
$$
\hat{C}_Z(h) = \frac{1}{N(h)} \sum_{i}^{N(h)} (z(x_i)-\mu)^2.
$$
\paragraph{[real vector data/matrix-variogram]} The basic tool of multivariate
two-point geostats. The user should be able to select which layer(s) to treat,
jointly in a column-vector $\mathbf{Z}$, define a polar grid of lag distances
$h$ to search for pairs of data, and estimate the variogram
$$
\hat{\boldsymbol{\gamma}}_{\bf Z}(h) = \frac{1}{2N(h)} \sum_{i,j}^{N(h)}
[\mathbf{z}(x_i)-\mathbf{z}(x_j)]\cdot [\mathbf{z}(x_i)-\mathbf{z}(x_j)]^t;
$$
actual calculations might be better done by individual coordinates, to allow for
some missing values in the vectors.
\paragraph{[compositional data/variation-variogram]} The proposed tool of
compositional two-point geostats. The user should be able to select a
compositioanl layer to treat, jointly in a column-vector $\mathbf{Z}$, define a
polar grid of lag distances $h$ to search for pairs of data, and estimate the
variogram
$$
\hat{\boldsymbol{\tau}}_{\bf Z}(h) = [\hat{\tau}_{kl}(h)], \quad \quad
\hat{\tau}_{kl}(h)=\frac{1}{2N_{kl}} \sum_{i,j}^{N_{kl}} \left(\ln
\frac{z_k(x_i)}{z_l(x_i)}-\ln \frac{z_k(x_j)}{z_l(x_j)}\right)^2.
$$
\paragraph{[categorical data/indicator-variogram]} The classical tool of
indicator two-point geostats. The user should be able to select which
categorical variable treat $Z$, define a polar grid of lag distances $h$ to
search for pairs of data, and estimate the variogram
$$
\hat{\boldsymbol{\gamma}}_{Z}(h) = [\hat{\gamma}_{kl}(h)], \quad \quad
\hat{\gamma}_{kl}(h)= \frac{1}{2N_{kl}(h)} \sum_{i,j}^{N_{kl}}
(J_k(x_i)-J_l(x_j))^2;
$$
where the disjunctive indicators show whether we are in one category $K_k$ or
not:
$$
J_k(x_i) = \begin{cases}
1 & Z(x_i)=K_k \\
0 & otherwise
\end{cases}.
$$
\paragraph{[categorical data/transiogram]}
\paragraph{[categorical training image/MPS search list]}
\paragraph{[multilayer training image/MPS search list]} ???
\section{Structural tool fitting/modelling}
At the step of fitting/modelling, the following pairs of [structural
function/model] should be offered:
\paragraph{[scalar-variogram or covariance/linear model of regionalization]} The
typical combination of valid simple models,
$$
\gamma(h|\boldsymbol{\theta}) = \sum_{a=1}^A c_a \cdot
\rho_a(h|\boldsymbol{\theta}_a)
$$
with those unitary variograms $\rho_a(h|\boldsymbol{\theta}_a)$ included in
table XXXX. The parameter vector $\boldsymbol{\theta}_a$ will include range and
anisotropy, and can
include other accessory scalars (periodicity and smoothness). Up to our present
knowledge, periodicity and range must share the same anisotropy structure, and
smoothness cannot be anisotropic.
\paragraph{[matrix-variogram or covariance/linear model of coregionalization]}
Also valid for compositions and indicators, as in the preceding case, it is a
combination of
valid simple models,
$$
\gamma(h|\boldsymbol{\theta}) = \sum_{a=1}^A \mathbf{C}_a \cdot
\rho_a(h|\boldsymbol{\theta}_a),
$$
where each sill matrix $\mathbf{C}_a$ can be:
\begin{itemize}
\item a covariance matrix (full-rank or rank-deficient positive definite
matrix) for real vector-valued layers
\item a variation matrix for compositional layers (i.e. for
$\boldsymbol{\tau}_{\bf Z}(h)$), possibly rank-deficient as well
\item a covariance matrix with at least one singular eigenvalue, for indicators
\end{itemize}
Individual unitary variograms $\rho_a(h|\boldsymbol{\theta}_a)$ satisfy the same
conditions as for the linear model of regionalization.
\paragraph{[directional-covariance/complex regionalization]} Wackernagel book,
paper from Sandra de Iaco
\paragraph{[transiogram/Markov transition matrix]} Thesis Enayat
\paragraph{[categorical MPS search list/saturated model]}
\paragraph{[MPS search list/unsaturated JointSim model]} paper Strategic Mine
Planning
\section{Interpolation}
\paragraph{Methods included} the co/kriging following methods should be
included:
\begin{itemize}
\item simple (co)kriging
\item ordinary (co)kriging
\item universal (co)kriging
\item (co)kriging with a trend
\end{itemize}
\paragraph{Compositional particularities} for a compositional layer, one should
be able to manipulate missing values
\paragraph{Interpolation model specification}
\paragraph{How to specify the interpolation grid}
\section{simulation}
\end{document}