Adrian Priceputu, Lecturer, Technical University of Civil Engineering Bucharest,
Constantin Ungureanu, Lecturer, University of Bucharest,
Ion Pencea, Professor, Politehnica University of Bucharest,


The paper introduces some novel approaches on point pattern analysis, with a large-scale
case study from a geotechnical investigation area based on 274 boreholes. The approach of
point pattern discretization, primary based on a pattern correlation coefficient minimization
through rotation of the coordinate system, was proven to be a powerful tool and much more
intuitive for geotechnical engineers. The paper addresses the main factors involved in
optimizing the point pattern analysis for discretization purposes and fully describes the set of
algorithms used, which may be easily employed for independent use.

key words: point pattern analysis, domain discretization, statistical processing

The valuable estimation of the distribution of a characteristic across a large area based on a set
of localized observations is a challenging task as it could save a large volume of material and
human resources not only in geotechnical engineering but also in many other fields as geology,
ecology, agriculture, mining, medicine etc. [1, 2]. The assessing problem of the way in which
the discrete observations are scattered across the targeted area belongs to the well-known point
pattern analysis (PPA) which incorporates a wide range of methods [1-5]. In this view, the
proper estimation of the geotechnical characteristic (porosity, density, permeability,
piezometric charge, elemental content etc.) across a large area using a set of discrete
measurements or estimates related to a given set of locations is of great importance as it could
extend the knowledge about the whole interest area at a parsimonious price. On the other hand,
the accuracy of the geological information critically depends on the way in which the sampled
locations provide adequate data i.e. the discrete data are sufficient enough as number and the
way in which the measurement locations are uniformly distributed across the targeted area [4-
5]. The paper addresses the spatial distribution of the boreholes performed in the Plateau area
whose locations are considered points Pi having (xi,yi) coordinates in a XOY Cartesian system.
Since different discretization methods may produce different results, the finding of an optimal
method is a critical issue [6]. In this regard, the Modifiable Areal Unit Problem (MAUP), is a
well-known issue to geographers. MAUP refers to the fact that the discretization of initially
non-aggregated data potentially creates several statistical biases linked to the position of
borders, aggregation level etc. [7-9]. The PPA can be performed using a movable window of
different shape as square [10], rectangular [11], circular [12] and administrative area [13].

As the point pattern addressed in the paper is not related to a known underlying process it was
stated that the finding of the optimum size of a square mesh used to cover the targeted area is
the first objective of the discretization. As a consequence, the paper does not address the first-
order properties of the Plateau Point Pattern, denoted as P3, as the borehole positions do not
depend on underlying processes. Accordingly, the paper does not address the modeling intensity
of the attribute assigned to the studied area, but establishing the optimum discretization mesh
based on proper choosing of the coordinate which ensures the minimum clustering of the points.

The vectors X={xi, i=1..n} and Y={yi, i=1..n} are assigned to the point set {P i(xi, yi), i=1..n},
where n the is set point number. The first step in PPA consists in estimating the primary
statistics of the P3 distribution (Fig. 2.1) such as the mean center, maximum and minimum X
and Y values. The second step consists in assessing the P3 type i.e. complete spatial randomness,
regularly, clustered or a mixture of randomness and clustered [2, 3, 14]. The assessing could be
done by common sense visual inspection of the P 3 map, but more accurate based on quantitative
statistics [15,16].
Initial Fratesti Point Pattern





350000 400000 450000 500000 550000 600000 650000 700000 750000
Fig. 2.1: The point distribution on Plateau area in original XOY coordinate

The optimum discretization of an area using a square grid was considered to avoid, as much as
possible, subjectivism in decision making. In this line, the main criterion for P 3 discretization
is minimum lacunarity. Lacunarity can be thought as a measure of ‘gappiness’ or ‘hole-iness’
of a geometric structure [17] (see green ellipses in Fig. 2.1). On the other hand, the lacunarity
analysis is an important technique for the analysis of spatial patterns based on proper theoretical
achievements [18].

We consider that a discretization has attained the minimum lacunarity of P3 when the number
of cells containing at least 1 point prevails among empty ones. This criterion implies a point
density approach of the P 3 analysis. The establishing of the optimum cell square size, denoted
∆, is impeded by the points clustering, points lineation, but mainly by the shape factor of the
area onto which the points are spread. In this direction, the paper addresses 3 important
1) the improving of the shape factor by choosing a proper coordinate system orientation;
2) the improving of the point spreading homogeneity through the minimization of the
pattern correlation coefficient, hereafter denoted r;
3) optimization of the lacunarity through a proper choosing of ∆.
Another novelty addressed in the paper is a set of algorithms that highlight the adequacy of the
Excel software to P3 analysis. The advantages provided by Excel consist in its availability as it
is a common Microsoft Office tool and its capacity on performing large calculations.

The paper provides the theoretical bases of the P 3 analysis, the statistics and the Excel
algorithms used to statistically characterize the studied P3. All the data included in the paper is
devoted to support this new approach of point pattern discretization based on the rotation of the
original coordinate system as to optimize the point scattering across targeted area. An optimal
point scattering corresponds to a minimum value of the correlation coefficient of X and Y
coordinate vectors, which has to be as close as possible to zero.

The initial Pearson`s correlation coefficient of the X and Y vectors is calculated as:

𝑖=1(𝑥𝑖𝑜 −𝑥
𝑜 ̅𝑜 )/𝑛
𝑖𝑜 −𝑦
𝑟𝑜 = (2.1)
𝑖=1(𝑥𝑖𝑜 −𝑥
2 𝑛 ̅̅̅̅
𝑜 ∗∑𝑖=1(𝑦𝑖𝑜 −𝑦
̅̅̅̅) ̅𝑜 )2 )/𝑛

where xio and yio are the coordinates of the Pi point in original coordinate system (So), ̅̅̅
𝑥𝑜 and,
𝑦𝑜 are the mean coordinates, respectively:

𝑖=1 𝑥𝑖𝑜 ∑𝑛
𝑖=1 𝑦𝑖𝑜
𝑥̅𝑜 = ; 𝑦̅𝑜 = (2.2)
𝑛 𝑛

The denominator in eq. (2.1) is the square root of the variances of X and Y vectors defined as,

𝑖=1(𝑥𝑖𝑜 −𝑥
2 ∑𝑛
𝑖=1(𝑦𝑖𝑜 −𝑦
𝑉𝑜𝑋 = ; 𝑉𝑜𝑌 = (2.3)
𝑛 𝑛

The values of VoX and VoY can be considered as the spreading measures of the points across a
given area (i.e. the more homogeneous are the points distributed the higher are V oX and VoY
values). When dealing with a complex-shaped area then it is difficult to ascribe high variances
values to either the homogeneous scattering, to the shape of the area or to both. However, VoX
and VoY change their values depending on the rotation angle of coordinate system, denoted S(θ).
Thus, if a new coordinate system is obtained by rotating the initial one (So) with a θ angle in
the positive trigonometric direction, then coordinates of the point Pi, i=1..n, in the new
coordinate system, can be calculated based on the coordinate transformation matrix:

𝑥` cos(𝜃) sin(θ) 𝑥𝑖
( 𝑖` ) = ‖ ‖ × (𝑦 ) (2.4)
𝑦𝑖 −sin(θ) cos(θ) 𝑖

Replacing the new coordinates of the points in eq. (2.3) one obtains the mathematical
expressions of the V X and VY variances as:

𝑉𝑜𝑋 +𝑉𝑜𝑌 ∗𝑡𝑎𝑛2 (𝜃)+2∗√𝑉𝑜𝑋 ∗𝑉𝑜𝑌 ∗𝑟𝑜 ∗tan(𝜃)

𝑉𝑋 (𝜃) = 1+𝑡𝑎𝑛2 (𝜃)
𝑉𝑜𝑌 +𝑉𝑜𝑋 ∗𝑡𝑎𝑛 2 (𝜃)−2∗√𝑉𝑜𝑋 ∗𝑉𝑜𝑌 ∗𝑟𝑜 ∗tan(𝜃) (2.6)
𝑉𝑌 (𝜃) = 1+𝑡𝑎𝑛2 (𝜃)
The values of the VX and VY depend on their initial values and on θ through tan(𝜃). It’s worth
noting that the total variance of a point pattern (PP) is an invariant i.e.

𝑉𝑋 (𝜃) + 𝑉𝑌 (𝜃) = 𝑉𝑜𝑋 + 𝑉𝑜𝑌 (2.7)

Eq. (2.7) shows that if the S is rotated as to obtain an optimum distribution of the PP across the
area, then the spreading of the PP remains constant as a whole, even though V X and VY vary.
The functional dependency of 𝑟(θ) on ro, f and θ can be calculated as:

𝑟𝑜 ∗𝑓+0.5∗(𝑓2 −1)∗tan(2θ)
𝑟(θ) = (2.8)
√𝑓 2 +[0.25∗(𝑓 2 −1)2 +𝑓2 ∗(1−𝑟𝑜2 )]∗tan2 (2θ)+ro ∗f∗(𝑓 2 −1)∗tan(2θ)
where: 𝑓 2 = ⁄𝑉 ,

The eq. (2.8) shows that r(θ) depends in a complex manner on three factors: r o, f and tan(2θ).
The ro and f can be considered hereditary factors of a given PP, whilst tan(2θ) is the factor that
lies at the hand of the analyst. The rotation of the S can offer the possibility to manage the
correlation among the X and Y vectors of a given PP in both directions, respectively to increase
r or to decrease it. Thus, our approach is based on rotating the S with a proper θ, as to achieve
minimum r value of a PP. This approach must take into account some specific cases that can be
met, respectively:
a) Regular PP with VoX = VoY (i.e. ro=0 and f=1); in this case r(θ)=0 for any θ, which means
that PP stays regular whatever the rotating angle θ.
b) Regular PP with VoX ≠ VoY (i.e. ro=0 and f≠1); in this case r(θ)≠0 for any θ≠0, which
means that PP loses its regular characteristic when rotating with θ.
c) Non-regular PP with the same VoX = VoY (i.e. ro≠0 and f=1); in this case, 𝑟(θ) becomes:
𝑟(θ) =
√1+(1−𝑟𝑜2 ) ∗tan2 (2θ)
which shows that r(45o)≈0 i.e. a rotation with 45 o is recommended to achieve a uniform
scattering of the PP.
d) Common PP characterized by r o≠0 and f≠1; This is the general case, that accepts a θo
solution in case where r(θo) = 0, given by:
1 2∗𝑟 ∗𝑓
θo = 2 ∗ atan ( (1−𝑓𝑜 2) ) (2.10)

The behavior of r(θ) for some representative cases is depicted in Fig. 2.2. The r(θ) curve
corresponding to a completely correlated PP (r o=1) having f=0.8 (green curve) shows that PP
presents the behavior of a highly random distribution, from r point of view, around θ≈40o and
behaves as a completely correlated PP at about 85 o.
Fig. 2.2: The curve r(θ) for different ro and f values

The curve corresponding to r o=0 and f=0.8, (purple curve) loses its regular character when
rotating the coordinate system, but r remains within the interval [-0.2; +0.2] which is equivalent
to a feeble correlation of the PP. The same behavior is observed for PP having r o=0.25 and f=0.8
ar even f=1. The statistical analysis of a given PP based on r, VoX and VoY offers a quantitative
information about the homogeneity of the points distribution across the area. Also, it can
provide information about linear clustering, but nothing about the point density.


The input data consists in a set of 274 points located in the Plateau area as is depicted in Fig.
2.1. A visual inspection of the Fig. 2.1 reveals seven clusters (red circles), also lacunary areas
(green ellipses) and stragglers. Such a PP shows from the beginning that it is difficult to be
discretized according to the envisaged criterion i.e. one point in each mesh cell. The primary
statistics of P3 are summarized in Tab. 1.

Tab. 3.1: Summary statistics of P3

Coordinate Min [m] Max [m] Mean [m] Median [m] Standard deviation [m]
X 408930 736764 597904 597538 7.36x109
Y 232366 435698 350753 343214 1.22x109

The average distance among the points along the X axis is 1196 m and 742 m along Y. The
difference between the X mean and Xmedian is 367 m, while between the Y mean and Ymedian is 7539
m. On the basis of the statistical parameters, the Y coordinates of the P3 contain much more
stragglers than X ones; since VoX is more than 5 times higher than VoY (f≈0.165) indicating a
larger spreading of X coordinates compared to the Y coordinates.

The initial value of the Pearson correlation coefficient of the P3 is ro=0.038 and it is quite close
to zero, therefore, P 3 can be categorized as a random PP. Since ro≠0 and f≠1 we are dealing with
a common PP (or general case as termed previously), where r can be minimized by rotating the
coordinate system (S) with θ angle (as shown in Fig. 3.a).
a) b) c) d)
Fig. 3.1: The effect of S rotation upon P3 distribution: a) the curve r vs. θ; b) P3 distribution for θ= θo=1.06o
(r=0); c) P3 distribution for θ=10.0o (r=0.301); d) P3 distribution for θ=46.0o (r=-0.716)

The random distribution of P3 corresponding to r(1.06 o)=0 is given in Fig. 3.1b and it does not
significantly differ from that original one as they have comparable correlation coefficients (0.00
vs. 0.038), but even rotating the S with 10 o induces a significant change of the point distribution
across Plateau area (Fig. 3.1c). The maximum negative correlation of the P3 point distribution
is r(46o) = -0.717. In this case the PP clearly shows a more correlation tendency alongside two
lines as are schematically shown in Fig. 3.1d. It is worth mentioning that the study of the
correlation of a PP can highlight linear tendencies in the data points. This effect can be, to some
extent, mitigated by a proper rotation of the S as to attain a r(θ)=0 which, fortunately, is possible
for many PPs.

Once having established the proper S, the next step is establishing the size Δ of the grid mesh.
The ∆ was reconsidered assuming that the PP is of regular type and its n points are uniformly
distributed in a rectangle of (x max-xmin)x(ymax-ymin) size, where x max and ymax are the maximum
coordinate values of the analyzed set of points, while x min and ymin are the minimum ones. In
this case, ∆ is calculated as:

𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 𝑦𝑚𝑎𝑥 −𝑦𝑚𝑖𝑛

𝑟∆= min ( , ) = min(19805, 12283) = 12283m (3.1)
√𝑛 √𝑛

In order to perform a density-based analysis, the coordinates (𝑥𝑖` , 𝑦𝑖` ) of each point Pi, i=1..n, are
transformed as:

𝑥𝑖` −𝑥𝑚𝑖𝑛
𝑥̃𝑖 = 𝑟𝑜𝑢𝑛𝑑𝑢𝑝 ( ) (3.2)

𝑦𝑖` −𝑦𝑚𝑖𝑛
𝑦̃𝑖 = 𝑟𝑜𝑢𝑛𝑑𝑢𝑝 ( ) (3.3)

` `
where 𝑥𝑚𝑖𝑛 and 𝑦𝑚𝑖𝑛 are the minimum values of (𝑥𝑖` , 𝑦𝑖` ) coordinate set, i=1..n. To each 𝑥̃𝑖 is
assigned a point set {Pij} having the coordinates (𝑥̃𝑖 , 𝑦̃𝑖𝑗 ) containing nj points. The 𝑥̃𝑖 defines a
vertical column whose width is ∆. Each ith column contains at least one point and has a height
hi :

ℎ𝑖 = [𝑚𝑎𝑥(𝑦̃𝑖𝑗 ) − 𝑚𝑖𝑛(𝑦̃𝑖𝑗 )] + 1, for j=1..n (3.4)

where hi is given in ∆ units, max(𝑦̃𝑖𝑗 ) and min(𝑦̃𝑖𝑗 ) are maximum and minimum values of 𝑦̃𝑖𝑗
on column i. The point density of the ith column is defined as:
𝜌𝑖 = ℎ𝑖 (3.5)

where ni is the number of points belonging to the ith column. The overall point density ρ is
calculated as:
𝜌 = ∑𝑘 (3.6)
𝑖=1 ℎ𝑖

where n is number of points in P3 and k is the number of columns. In this frame, the value of ∆
can be adjusted as to attain a ρ≈1 on condition that 𝜌𝑖 , i=1..k, shows the same tendency. The
distribution of the 274 points of the P3 per column and the strip point densities for the
discretization with ∆ = 11500 m and θ=90o are shown in Fig. 5. A detailed strip point density
is given in Fig. 11.

a) b)
Fig. 3.2: Results for θ=90 and ∆=11500 m

The steep variation of the point number ni per column in Fig. 3.2a is quite normal as the initial
shape and point distribution of P3 is complex i.e. it shows 7 clusters and 5 lacunar zones (Fig.
2.1). The local densities shown in Fig. 3.2b varies around 1 and only two ρi are above 2 ru-2 (ru-
relative unit), while one strip has a ρi close to 0. These features support the statement that it was
achieved almost the best discretization among the possible ones.

The density analysis method combined with a proper rotation of the coordinate system have
proven being the solution toward the suitable discretization of the P3 according to the imposed
criteria as: advanced randomness i.e. r ≈1 and ρi≈1 ru-2 (ru-relative unit). The functional
dependency of 𝑟(θ) on the hereditary factors r0 and f0 of a given PP and on tan(2θ) was derived
for the first time in (2.8). Also, the invariance of the total variance of a PP was established for
the first time in (2.7).

Under these conditions, the best discretization of the P3 has been obtained in a S rotated with
90o related to its initial position using a grid having square mesh size of 11500 m. In this case,
the P3 has an r=-0.04, which attests its randomness, and a global density ρ=1.09 m-2 close to
ideal case i.e. ρ=1.00 ru-2.
The main novelty addressed in the paper consists in the using of S rotation to get the best shape
of the area onto which the points are scattered. The Pearson`s correlation coefficient is a
powerful parameter for quantifying the P3 type (regular, randomness, partially clustered), but
not for identifying a specific form of clustering i.e. linear clustering. Last, but not the least,
novelty consists in the way into which the statistics were combined with shape and scale factors
(∆) to find out the suitable discretization of the P3. It was also shown that Excel is a suitable
tool for the computations needed for PPA based on ∆ and θ parameters. For a better
discretization the geotechnical engineer must be aware of the fact that regular and complete
random PPs can provide similar values of the Pearson correlation coefficient - close to zero.

