Journal of African Earth Sciences 185 (2022) 104419

Journal of African Earth Sciences

Well log analysis for lithology and fluid contacts in Rovuma Basin –
Mozambique: Application of cluster and discriminant analyses
Jone Lucas Medja Ussalu a, Amin Bassrei b, *
UEM/FAEF/DER, Eduardo Mondlane University, Mozambique
IGEO/CPGG/UFBA, Universidade Federal da Bahia & Instituto Nacional de Ciência e Tecnologia de Geofísica de Petróleo, Brazil


Keywords: This study applies the cluster and discriminant analyses in geophysical well log data from the Rovuma sedi­
Rovuma basin mentary Basin - Mozambique. The main objective was to determine the lithological profile and fluid contacts in
Lithology reservoirs. Well log data from five wells drilled on the same basin were used. For the discrimination, a reference
Fluid contact
well was chosen for training, and the obtained functions from it were then applied to the remaining wells. The
Cluster analysis
Discriminant analysis
classification process comprehended three main phases, namely, the separation of shale/non-shale layers along
the entire logged section, separation of water/hydrocarbon within reservoirs and the separation of oil/gas within
hydrocarbon bearing zones. The two methods, cluster analysis and discriminant analysis, were applied in parallel
and the results are compared in each classification phase. The quality of reservoirs was also assessed by applying
cutoffs in relation to shale content and effective porosity, delineating net reservoirs. In general, both methods
converged to the same lithological model and fluidtypes in reservoirs. Gas has been indicated as the most pre­
dominant hydrocarbon in the basin.

1. Introduction lithologies, a simple visual interpretation is sufficient for lithological

characterization (Asquith & Krygowsky, 2004). In general, well log
Well logging involves a set of operations and techniques, from the analysis comprises a set of methods that aim to extract information from
acquisition, processing and interpretation of the physical properties of the logs, in order to identify, quantify and produce a geological model.
rock formations. A well log consists of records of variations in one of the Cluster and discriminant analyses have shown good applicability in
physical properties of rock formations as a function of depth. the processing and interpretation of well log data. In particular, many
Well log analysis can be carried out qualitatively or quantitatively. In researchers have applied these techniques to identify and separate
the qualitative analysis, visual evidences are considered, based on the different lithotypes and fluids in formations (Flexa et al., 2004; Rosa
log variations with depth, which may eventually distinguish the et al., 2008).
different layers across the formation. On the other hand, quantitative Cluster analysis is an unsupervised classification process, what
analysis, involves numerical calculations, where petrophysical proper­ makes the method quite useful, especially in the exploratory phase when
ties such as shale volume, porosity and fluid saturation of formations are there are no prior hypotheses about the classifying model of lithologies
determined (Schön, 2015). in the field. On the other hand, discriminant analysis is a supervised
The characterization of geological formations cannot be derived process, what means that prior information is necessary. Besides its
from one type of log alone, it is necessary to combine several physical application for performing statistical separation of datasets, discrimi­
parameters (a complex interpretation) in order to derive a consistent nant analysis may also be applied to predict the occurrence of a certain
model of the formation. Therefore, the historical development of well event related to the data and to explain the relationship between vari­
logging is characterized by the development of several systems with ables in a multivariate space.
different sensitivities. In this work, we evaluate well log data from five boreholes drilled
One of the most used techniques in the interpretation of lithology is over the Rovuma sedimentary basin, Mozambique. We have applied the
the Cross-Plot technique, involving porosity logs. In the case of simple cluster and discriminant analyses to determine the lithological profile

E-mail addresses: (J.L. Medja Ussalu), (A. Bassrei).
Received 26 December 2020; Received in revised form 21 October 2021; Accepted 3 November 2021
Available online 8 November 2021
J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

3. Multivariate analysis

3.1. Cluster analysis (CA)

Cluster analysis is a process that basically consists of arranging a set

of data or objects with common characteristics into groups. Objects
belonging to the same group or cluster have more similarities to each
other which differentiate them from those belonging to other groups.
In real life, data sets are so complex that different groups may not be
clearly separated. The clustering algorithms face the same challenge, so
we need to define some criteria of dissimilarity, which in some cases can
be a challenging task. Here we make use of the centroid-based method,
the most common clustering method, known as K-Means Clustering. The
K-Means clustering technique partitions data into K groups or categories
so that the data points in the same group are closer and the data points in
different groups are more distant. The algorithm tries to minimize the
distances within group and maximize the distance between groups
(Everitt et al., 2011).
The similarity between two data points is determined by the distance
between them. There are many methods for measuring distance. Usu­
Fig. 1. Study area, the Rovuma Basin in Northern Mozambique.
ally, the metric adopted for many situations is the Euclidean Distance,
however other metrics can be used, preferably those that best reflect the
and fluid contacts in reservoirs. In addition, the quality of reservoirs was data similarities.
also assessed by applying cutoffs in relation to shale content and effec­ To describe the K-Means Clustering, we have considered X as a N ×
tive porosity, in order to delineate net reservoirs. M matrix whose columns are the well log vectors xj , so that xij is the
In general, the application of the two methods together has led to value of a log j recorded at a depth i. Each row of X is a vector pi (xi1 , xi2 ,
consistent results. Both methods converged to quite similar lithological ..., xiM ) defining a point in the M-dimensional space, whose elements are
models, and to the same fluidtypes in reservoirs. In terms of hydrocar­ the measures of the M well logs recorded at a depth i. In this case,
bon potential, most reservoirs in the region are predominantly gas considering two points p1 and p2 in the data space, the Euclidean Dis­
saturated. tance (ED) between them is expressed as follow:
2. Characterization of the study region ED = (x11 − x21 )2 + (x12 − x22 )2 + ... + (x1M − x2M )2
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ (1)
The Rovuma Basin is one of the most important sedimentary basins = (p1 − p2 )(p1 − p2 )T
in Mozambique, both in terms of the volume of accumulated sediments
Each log vector xj was standardized by subtracting from each value
and in terms of the occurrence of hydrocarbons. It is located in the north-
the log mean and dividing by its standard deviation, according to Davis
east of Mozambique. The onshore area is about 17,000 km2 comprising
the entire eastern part of Cabo Delgado and partially the eastern part of
Nampula provinces. The offshore area is about 12,500 km2. The Basin xij − xj
x’ij = , (2)
extends for about 400 km in north-south direction, and the maximum σj
east-west width is about 160 km (Hancox et al., 2002; Key et al., 2008).
Fig. 1 shows the onshore and offshore parts of the Rovuma Basin. Several √
√ 1 ∑ N ( )2
wells drilled in the region have confirmed the occurrence of hydrocar­ σj = √ xij − xj , (3)
N − 1 i=1
bons both onshore and offshore. Currently, only natural gas is being
explored in the Basin. However, future oil exploration has been
confirmed. where x’ij is the i-th standardized value of log j, and σ j is the standard
Mozambique presents great geological differences from north to deviation of log j. By this procedure each standardized log gets an
south. The north is fundamentally Proterozoic and the south entirely average of zero and a variance equal to one.
Phanerozoic, with the centre region hosting Archaic, Proterozoic and K-Means is an iterative process. The algorithm performs the
Phanerozoic terrains (Vasconcelos, 2014). following steps: (1) Initially, it randomly selects K points in the data
According to Key et al. (2008), the entire stratigraphic development space as centroids of the K groups to be separated; (2) Calculates the
of Rovuma Basin is directly related to the progressive fragmentation of distances between each data point and the centroids; (3) Associates each
south-eastern Gondwana that created Africa as a separate continent. data point to the nearest centroid; (4) Finds new centroids by averaging
Intra-continental tectonism associated with the East African Rift System the data points associated to the same group; (5) Repeats the steps 2, 3
also influenced Cenozoic sedimentation in the Rovuma Basin. Sediments and 4 until convergence. In the convergence all group centroids become
of Rovuma Basin were deposited between the Jurassic and Quaternary unalterable.
periods (Hancox et al., 2002). The maximum thickness of sediments in As the initial centroids are randomly chosen, depending on the data
the entire Basin is about 10 km based on geophysical surveys (Key et al., distribution, local convergence is possible. To avoid this problem, the
2008). algorithm was set to run multiple times. The best grouping is the one
The offshore area of the Basin has upper sediments mainly composed that presents the smallest sum of squared distances within groups, SSW ,
of deltaic deposits, associated with the Rovuma River. The deltaic de­ expressed by:
posits are sustained in depth by sandstones and calcites of Cretaceous. A Ng
K ∑
∑ ( )( )T
stratigraphical column for the onshore and offshore parts of the Rovuma SSW = pgi − cg pgi − cg , (4)
Basin and Delta along the northern region of Mozambique is shown in g=1 i=1

Fig. 2.
where cg and Ng are the centroid and the size of each group respectively.

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 2. Stratigraphic column for the onshore and projected offshore parts of the Rovuma Basin, northern Mozambique (Source: Brownfield, 2016).

3.2. Discriminant analysis (DA) However, it is possible to combine them and find an orientation in which
the two groups are separated as much as possible with a minimum
Discriminant analysis is one of the most widely used multivariate variance in each group (Davis, 2002).
procedures in Earth science. Unlike cluster analysis, in discriminant Using the same convention as in the previous subsection 3.1, the
analysis sample data from the groups to be classified must be defined linear discriminant function can be mathematically expressed as:
prior. This information is used to generate the discriminant function.

Depending on the number of groups to be classified or the problem Zi = λ1 xi1 + λ2 xi2 + ... + λM xiM = λj xij , (5)
complexity, the discriminant function can be linear or quadratic. In this j=1
work, we make use of the linear discriminant analysis, which is more
convenient when only two groups are defined in each discrimination. where Zi is the discriminant index at a certain depth i, λj (j = 1, 2, ..., M)
Considering the case of two datasets (A and B) in a multivariate are the coefficients of the discriminant function and xij represents the
space, the method seeks an orientation in which the datasets present the independent variables (the log values).
maximum separation, and simultaneously, the variance in each dataset The discriminant function coefficients are determined using multiple
is minimal. This is illustrated for the two-dimensional (bivariate) case, in regression where the dependent variable consists of differences between
Fig. 3. As can be seen in that figure, a separation between the groups A the multivariate means of the two groups. According to Davis (2002),
and B cannot be obtained using one of the variables X1 or X2 at a time. this technique minimizes the probability of erroneously classify a new

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 3. Representation of discriminant function for a bivariate distribution. The two datasets are indicated by the open circles for group A and solid dots for group B.
Dashed lines indicate bivariate means of the two groups (Modified from Davis, 2002).

Table 1
where xAij is the i-th observation of variable j in the group A, xAj is the
Description of the well log data. average of variable j in the group A, xBij is the i-th observation of variable j
MNEMONIC CURVE TYPE UNIT COMMENT in the group B, xBj the average of variable j in group B, NA is the number
HCAL Caliper in HRCC Cal. Caliper of observations in group A, and NB is the number of observations in
GR Gamma Ray gAPI Natural Gamma Ray group B.
RHOZ Density g/ HRDD Standard Resolution Formation The multivariate means of groups A and B form two vectors, so we
cm3 Density
can express the vector d in the expanded form:
TNPH Neutron V/V Thermal Neutron Porosity
Porosity ⎡ ⎤ ⎡ ⎤
DTCO Transit Time μ sec Delta-T Compressional
AO10 Shallow Ω.m Array Induction One Foot Resistivity
⎡ ⎤ ⎢ A ⎥
⎢ x1 ⎥
⎢ xB ⎥
⎢ 1 ⎥
d1 ⎢ A⎥ ⎢ B⎥
Resistivity A10 ⎢
⎢ d2 ⎥ ⎢ 2 ⎥
x ⎢ x2 ⎥
RT Deep Resistivity Array Induction One Foot Resistivity ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
Ω.m ⎢. ⎥ ⎢. ⎥ ⎢. ⎥
⎢ ⎥=⎢ ⎥ −
⎢ ⎥
⎢. ⎥ ⎢. ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢. ⎥
⎣. ⎦ ⎢ ⎥ ⎢ ⎥
element in one of the groups. It was also adopted by Flexa et al. (2004), ⎢. ⎥
⎥ ⎢. ⎥
⎢ ⎥
dM ⎢ A⎥ ⎢ B⎥
and is expressed by the following matrix equation: ⎣ xM ⎦ ⎣ xM ⎦
S λ = d, (6)
To construct the matrix S of combined variances and covariances, we
where S is a M × M matrix of pooled variances and covariances of M must calculate a matrix of the sums of squares and cross products of all
variables, λ is a column vector formed by the coefficients of the variables in the group A (SCA) and a similar matrix for the group B
discriminant function and d is a column vector formed by the differences (SCB). Considering the group A, we have:
between the multivariate means of the two groups. Equation (6) can be
solved using the basic concepts of inverse problems. Given that S is a NA (
∑ ) 1 ∑ NA ∑NA
SCAjk = xAij xikA − xAij xAik π, (10)
square matrix, if it is also non-singular, then the inverse S− 1 exists, and i=1
NA i=1 i=1
we have:
where xAij and xAik represent the i-th observations of a pair of variables j
λ = S− 1 d (7)
and k in the same group A. The analogous process applies to find SCBjk
To calculate the function coefficients λj , we must determine the en­ for the group B. Thus, the matrix of variances and covariances can be
tries in the matrix equation (7). The vector d is found simply by: calculated by:

1 ∑ NA
1 ∑ NB
dj = xAj − xBj = xAij − xB , (8) S= (11)
NA i=1 NB i=1 ij NA + NB − 2

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 4. Flowchart of the study methodology.

Now we have all necessary terms to solve equation (7) to obtain the Fig. 5. Selection of sample database of shale (3012–3030 m) and non-shale
(2938–2953 m) in the reference well. The headings of each panel show the
coefficients λj . The discriminant function generates a single number for
plotted logs and their respective range and unit. The arrows indicate the log
each observation point, called discriminant index or discriminant score,
axes orientation.
which represents the position of the point along the line defined by the
discriminant function. The substitution of the multivariate averages xA
4. Material and methods
and xB of groups A and B in the discriminant function generates the
groups centroids ZA and ZB respectively: ZA = λ1 xA1 + λ2 xA2 + ... + λM xAM
4.1. Well log data
and ZB = λ1 xB1 + λ2 xB2 + ... + λM xBM .
The distance between the two centroids corresponds to the difference Five wells drilled over the Rovuma sedimentary Basin were evalu­
ZA − ZB . This distance is known as the Mahalanobis distance (D2 ) or ated in this study, namely Well-1 (2243.02–3243.07 m), Well-2
generalized distance (Davis, 2002). The Mahalanobis distance helps to (2455.01–4393.38 m), Well-3 (2454.85–3414.06 m), Well-4
calculate the relative contribution ej of each variable for the discrimi­ (596.10–3104.10 m) and Well-5 (1052.01–3842.00 m). All well log
nation, given in the form: data have a sampling interval of 0.1524 m. Table 1 shows the list of logs
λj dj used for this study.
ej = × 100% (12)
The separation of the two groups is defined by a cutting score ZC . The 4.2. Lithology and fluid contacts
optimal cutting score depends on the dimensions of the groups. Ac­
cording to Ramayah et al. (2010), when the number of observations is The lithological profile and fluid contacts were determined using the
the same, the cutting score will simply be the average of the two group cluster analysis and the linear discriminant analysis. The process con­
centroids: sisted of three main phases or stages: (i) the separation of shale and non-
ZA + ZB shale (potential reservoirs) formations, (ii) the separation of water and
ZC = , (13) hydrocarbon in reservoirs and (iii) the separation of oil and gas in the
hydrocarbon bearing zones. The fundamentals of the cluster and
otherwise, the index will be found by the formula: discriminant analyses are described in section 3. These two methods
were applied in parallel and the results are compared in each classifi­
ZC = (14) cation stage.
As mentioned above in subsection 3.2, for discriminant analysis, it is
necessary that group samples are previously defined. These sample data
are used to determine the discriminant function coefficients and the
discriminant scores. The Well-3 was chosen as a reference well to obtain
the sample data. The coefficients and scores obtained from this reference

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

different lithotypes and fluids, so the most indicated sets of curves for
each classification were chosen. Databases used for the discrimination
were formed by the set of curves (logs) that present the most significant
difference in patterns between groups. The more discrepant are the
curve patterns between groups, the greater is the chance of success in the
discrimination. Once the classification process comprehended three
phases, also three databases were formed.
The database for shale and non-shale was formed from the logs
considered as sensitive to lithology. In this case the GR and TNPH played
an important role. While shale formations present relatively high values
of both GR and TNPH, non-shale formations (sandstones and carbon­
ates) present relatively low GR and TNPH. The discrepancies of these
variables between the two groups are quite significant. Other comple­
mentary logs were the DTCO and HCAL. Fig. 5 shows the sample se­
lection of shale (3012–3030 m) and non-shale (2938–2953 m) in the
reference well.
Water, oil and gas samples were selected inside the reservoir for­
mations. Overlays and cross-plots of density and neutron porosity were
combined with the deep resistivity. Water bearing zones are character­
ized by relatively low resistivity and relatively high neutron porosity
and density. In the hydrocarbon bearing zones the resistivity becomes
high and the density decreases. The neutron porosity that is related to
the hydrogen index, only changes with the presence of gas, decreasing to
very low values. Fig. 6 shows the selected samples for water
(2990–3005 m), oil (2903–2913 m) and gas (2938–2953 m).

4.2.2. Shale and non-shale separation

This was the first problem to be addressed. Four logs were used in
this process: HCAL, GR, TNPH and DTCO. At this stage, the entire logged
section of the well was considered. Thus, each data point in the cluster
analysis was represented as pi (HCALi , GRi , TNPHi , DTCOi ) and the
discriminant function was defined according to the expression.
Zi = λHCAL × HCALi + λGR × GRi + λTNPH × TNPHi + λDTCO × DTCOi .

4.2.3. Net reservoir

Fig. 6. Selection of sample databases for water (2990–3005 m), oil
From the previous procedure (subsection 4.2.2) we obtained the
(2903–2913 m) and gas (2938–2953 m) in the reference well. The headings of
potential reservoirs. Before proceeding to the next phase, which is the
each panel show the plotted logs and their respective range and unit. The ar­
rows indicate the log axes orientation.
fluid classification, it was necessary to determine the net reservoir. The
term net reservoir can be understood as being the fraction of the total
thickness of reservoir formation (gross reservoir) capable of storing
well are then applied to the remaining wells. According to Flexa et al.
fluids (water, oil and gas). This implies that the formation must have a
(2004), considering the absence of important diagenetic effects, these
significant porosity and insignificant shale content. So, for net reservoir
discriminant functions can be applied to neighbouring wells. In this way,
determination, cutoffs in relation to shale content and porosity are
the interpretation carried out in the reference well is extended
throughout the entire basin field. Fig. 4 shows the schematic represen­
Cutoffs can be defined as limiting values imposed to outline the re­
tation of the study methodology.
gion of interest. In the Western petroleum industry, cutoffs were adopted
and applied as rules of thumb for assessing hydrocarbon producing
4.2.1. Databases for the discriminant analysis
zones (net pay). In particular, for net pay determination, four petro­
Given that we did not have any information from cores and cuttings,
physical parameters are usually considered, such as shale volume,
which could be helpful to define the group samples in the field, well log
effective porosity, water saturation and permeability.
interpretation techniques such as of visual interpretation (quick-look),
There is no defined criterion to determine the cutoff values, what
overlays and cross-plots were applied prior to select sample data of the
implies that reasonable values are arbitrarily chosen depending on the
different groups of interest.
region and purpose (Worthington & Consentino, 2005). Crain (2019)
Well logging tools are classified according to their sensitivity to the

Table 2
Discriminant functions coefficients and the relative contribution of each variable in the three classification processes. There are three log combinations for the three
different purposes. The coefficients signal indicates if the variable is contributing positively or negatively to the discriminant index.


Shale/Non-Shale Coefficients − 34.78 − 2.05 − 0.01 − 8.96

Relative contribution (%) 16.14 81.24 0.07 2.55
Water/Hydrocarbon Coefficients 56.33 − 0.07 74.23
Relative contribution (%) 45.27 15.07 39.66
Oil/Gas Coefficients 232.95 − 7.31
Relative contribution (%) 99.64 0.36

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

applied. It combines the density and neutron porosity logs according to

the following equation (Schön, 2015):
( )( ) ( )( )
φN − φN,ma ρma − ρsh − φN,sh − φN,ma ρma − ρb
φeff = ( )( ) ( )( ), (17)
φN,fl − φN,ma ρma − ρsh − φN,sh − φN,ma ρma − ρfl

where φN is the neutron porosity (the log reading value at a given

depth), ρb is the formation density (the log reading value at a given
depth), φN,fl is the fluid neutron response, ρfl is the fluid density; φN,ma is
the matrix neutron response, ρma is the matrix density, φN,sh is the
neutron response of shale (shale point reading) and ρsh is the density of
shale (shale point reading).

4.2.4. Water and hydrocarbon separation

So far, the formations classified as shale have been discarded. Having
the net reservoir, we proceed to the second phase. For this problem, well
logs considered sensitive to fluids were used: RT, RHOZ and TNPH. The
cluster analysis was implemented considering each data point as
pi (RTi , RHOZi , .TNPHi ) and the discriminant function was expressed as.
Zi = λRT × RTi + λRHOZ × RHOZi + λTNPH × TNPHi .

4.3. Oil and gas separation

This is the third and last phase. Here, all water saturated zones have
been excluded, and only hydrocarbon bearing zones remained. The
following logs were used in this classification: RHOZ and TNPH.
Therefore, each data point for the cluster analysis is given as
pi (RHOZi , TNPHi ) and the expression for the discriminant function is.
Zi = λRHOZ × RHOZi + λTNPH × TNPHi .

5. Results and discussions

The main goal of this study was to determine the lithological profile
Fig. 7. Projection of the multivariate data samples onto discriminant function and fluid contacts applying the Cluster Analysis (CA) and the Discrimi­
line: (a) shale and non-shale discrimination, centroid for shale (ZSh ) and nant Analysis (DA). Unlike CA, for the DA, linear discriminant functions
centroid for non-shale (ZRes ). (b) hydrocarbon and water discrimination, and scores were determined prior from the Well-3 (reference well) and
centroid for hydrocarbon (ZH ) and centroid for water (ZW ). (c) oil and gas then applied for the remaining wells.
discrimination, centroid for oil (ZO ) and centroid for gas (ZG ). The cutting score
is denoted by ZC .
5.1. Analysis in the reference well

presented typical intervals for the net pay delineation as follow: The discriminant functions coefficients and the relative contribution
maximum shale volume of 0.25–0.45, minimum porosity of 0.03–0.16, of each log for the three discriminatory problems are shown in Table 2.
maximum water saturation of 0.30–0.70 and minimum permeability of The coefficient signal indicates if the variable is contributing positively
0.01–5.0. or negatively to the discriminant index. The absolute values are influ­
In this study we have considered only shale volume (VSh ) and enced by the variance of the respective discriminant variable involved. A
effective porosity (φeff ) as we just wanted to delineate the net reservoir. variable with a greater variance tends to have less weight and thus
The first cutoff (VSh ≤ 0.4) was applied to the gross reservoir to elimi­ contributes less to discrimination.
nate portions of the formation with high shale content to obtain the net In the first problem that consisted of shale and non-shale classifica­
sand. This application is important for the purpose of this research tion, the HCAL, GR, TNPH and DTCO logs were used. The GR log had the
because it can avoid that, for instance in the water-hydrocarbon sepa­ greatest relative contribution (81.24%) as expected and the DTCO had
ration, shaly sands are erroneously classified as water saturated reser­ presented the lowest contribution (0.073%) so that it could be discarded
voirs, given that, the shale presence reduces the resistivity and increases to simplify the discriminant function.
the neutron porosity. And the second cutoff (φeff ≥ 0.15) is applied to The separation of groups in the DA is made by the cutting score ZC .
the net sand to remove the portions with low porosity (e.g. tight sand), For each depth, a Zi index is calculated, and is then compared to the ZC .
outlining the net reservoir. In this particular case, shale formations are assigned for Zi < ZC , and
There are several methods for estimating the shale volume and non-shale or potential reservoirs are assigned for Zi > ZC . Fig. 7(a)
porosity. In this work, the shale volume was calculated from the GR log, shows the discriminant scores ZSh and ZRes , which define the centroids of
and the linear approximation VSh = IGR was chosen to obtain the most shale and potential reservoirs respectively. These centroids represent the
pessimistic value, as we intended to overestimate the shale content: projections of the multivariate means of each group onto discriminant
function line, and it is observed that the two groups are well separated.
GR − GRmin
VSh = IGR = , (16) Fig. 8 shows the shale and non-shale zones in the Well-3 from both
GRmax − GRmin
methods, CA and DA. Four potential reservoirs were identified in this
where IGR is the Gamma Ray Index, GR is the actual Gamma Ray, GRmin is well, designated as W3-R1 (2845–2905 m), W3-R2 (2935–3010 m), W3-
the minimum Gamma Ray and GRmax is the maximum Gamma Ray. R3 (3050–3170 m) and W3-R4 (3270–3330 m). The total thickness of
For effective porosity, a recommended method for shaly sand was the four potential reservoirs constitutes the gross reservoir. The cutoffs
of shale content and porosity were applied over this region to define the

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 8. Lithological classification in the reference well (Well-3), (a) the discriminant function and scores, (b) the lithological definition by the discriminant analysis
and (c) the lithological definition by the cluster analysis.

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 9. Lithology and fluid contacts in the Well-3: (a) lithological profile, (b) net reservoirs, (c) fluid contacts by the discriminant analysis and (d) fluid contacts by
the cluster analysis.

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 10. Cross-plots of density - neutron porosity for the four reservoirs identified in the Well-3. Concentrations of points on the cross-plots reflect the effect of each
fluid type into reservoirs. The plotted dashed red line for Sandstone, solid black line for Calcite and dot-dashed blue line for Dolomite were based on the Log
Interpretation Charts from Schlumberger (2009).

net reservoir before proceeding to the second phase. This process con­ Zi > ZC .
sisted of removing all layers with shale content greater than 0.4, and Now the water-oil-gas contacts have been defined inside the reser­
with porosity less than 0.15. voirs. Fig. 9 shows sequentially from column (a) to (d), the lithology in
In the second phase which consisted to the water - hydrocarbon column (a), the net reservoirs in column (b), and fluid contacts by the DA
classification, a new discriminant function was determined from the in column (c) and by the CA in column (d). It is observed that the two
second database and the RT, RHOZ and TNPH logs were used. The TNPH methods, DA and CA, have identified the same groups in most depths of
log had the most significant contribution (45.27%) followed by the the reservoirs. The blank spaces presented in the columns (c) and (d) for
RHOZ (39.66%). In fact, the RT log was expected to have the most fluid contacts reflect the cuts made in the process of determining the net
significant contribution, as it is theoretically known that the resistivity reservoir. At those depths, reservoirs are considered not capable of
discrepancy between water and hydrocarbon is significant. Thus, two storing fluids, due to both or one of the factors, high clay content or low
possibilities can be considered to explain this result. One is that the most porosity. Thus, this process was important to assess the quality of
predominant hydrocarbon is gas what maximizes both the TNPH and reservoirs.
RHOZ differences between the two groups. And the other is that, the The reservoir W3-R1 is mostly filled with gas, however both methods
water is more likely to be fresh so that resistivity varies little from water indicated a contact with a small oil layer at the depth of 2901 m by the
to hydrocarbon bearing zones. Fig. 7(b) shows the centroids of the two DA and at the depth of 2894 m by the CA. In the reservoir W3-R2
groups ZH and ZW which correspond to hydrocarbon and water respec­ immediately bellow, the three fluids were identified by both methods,
tively. Analogously to the previous procedure, a Zi index is calculated for with water-oil and oil-gas contacts respectively at 2970 and 2960 m by
each depth. Hydrocarbon saturated zones are assigned for Zi < ZC , and the DA and at 2968 and 2957 m by the CA. In the reservoir W3-R3, only
water saturated zones for Zi > ZC . water saturation was detected and in the last reservoir W3-R4 that ap­
In the third phase which is the oil - gas separation, again, a new pears after a relatively extensive shale layer, only gas saturation was
discriminant function was determined and the RHOZ and TNPH logs identified.
were considered. The most significant contribution had been of the Cross-plots of density - neutron porosity for the four reservoirs
TNPH log as expected. The centroids of the two groups ZG and ZO are separately are shown in Fig. 10. This was an assessment of the methods
illustrated in Fig. 7(c), corresponding to gas and oil respectively. The gas performance for the fluid identification into reservoirs. From the cross-
saturated zones are assigned for Zi < ZC , and the oil saturated zones for plots, we can observe the expected effect of each identified fluid type

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 11. Lithology and fluid contacts in the Well-1: (a) lithological profile, (b) net reservoir, (c) fluid contact by the discriminant analysis and (d) fluid contact by the
cluster analysis.

according to Schön (2015). In a hypothetically clean and water satu­ 5.2. Analysis in the remaining wells
rated sandstone reservoir, the distribution of the scatter points would
fall exactly on the dashed red line, and the presence of hydrocarbon The application of the discriminant functions and scores from the
would deflect the points up the line. The gas, in particular, pulls the reference well to the remaining four wells extends the interpretation
points to the upper left corner of the graph (low values of both density throughout the basin field. Besides, it is also an assessment of the per­
and neutron porosity). However, for sandstones not precisely clean formance of these functions for the classification of lithotypes and flu­
(with some remnant shale content) which is the real scenario in many idtypes in the field.
cases, it is observed that the water saturated reservoir (W3-R3) presents In the Well-1, three potential reservoirs were defined as can be seen
a concentration of points slightly dragged from the sandstone line to­ in Fig. 11. In the first reservoir W1-R1, both DA and CA indicate the
wards the calcite line (solid black line). The gas saturated reservoirs presence of oil and in the second W1-R2 gas saturation is identified, with
(W3-R1 and W3-R4) have a focus of points concentration above the a small layer of water beneath indicated by the DA while the CA suggests
sandstone line towards the upper left corner as expected. In the reservoir oil saturation, after tiny shaly sand at the depth of 2622 m. In the third
with water-oil-gas contacts (W3-R2) the scatter points present a reservoir W1-R3, gas-water contact is indicated by both methods but at
continuous distribution comprising the sandstone line towards the upper different depths. While the DA indicates at the depth of 2825 m, the CA
left corner (gas region). indicates at 2807 m. Still, in the same reservoir an oil saturated layer was
also detected after a small tight sand layer.
In the Well-2 illustrated in Fig. 12, two potential reservoirs were

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 12. Lithology and fluid contacts in the Well-2: (a) lithological profile, (b) net reservoir, (c) fluid contact by the discriminant analysis and (d) fluid contact by the
cluster analysis.

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 13. Lithology and fluid contacts in the Well-4: (a) lithological profile, (b) net reservoir, (c) fluid contact by the discriminant analysis and (d) fluid contact by the
cluster analysis.

defined. The first W2-R1 is of very low quality (presence of formations presence of water in most reservoirs, while the CA suggests oil saturation
with high clay content and low porosity), so that most of it was dis­ except in the W5-R2, where tiny water saturated layer is indicated.
carded after being subjected to the cutoffs of clay content and porosity. Therefore, this well has little potential for hydrocarbon production.
The two methods did not converge on identifying fluids into the first
reservoir W2-R1, while the DA indicates gas saturation, the CA suggests 6. Conclusions
oil saturation, possibly due to the effect of remnant shale content. In the
second reservoir W2-R2, a water-oil contact was identified by the CA at Although no prior information from cores and cuttings were avail­
the depth of 3240 m. able in the study region, the application of discriminant analysis and
Fig. 13 shows the classification for the Well-4 and three reservoirs cluster analysis together has led to consistent results. The application of
are defined. While the DA identified only gas saturation in the first cutoffs for outlining the net reservoir was essential to assess the quality
reservoir W4-R1, the CA indicates the presence of gas and oil separated of reservoirs in the basin.
by tiny tight sand between 2718 and 2723 m. The second reservoir W4- Overall, it is observed that the three fluidtypes (water, oil and gas)
R2 is completely saturated with water. And in the third reservoir W4-R3, are present in the basin. Reservoirs with great potential for hydrocarbon
both methods present gas, oil and water separated by small shaly sand exploration were found in the Well-1, Well-3 and Well-4. Gas has been
layers. indicated as the most predominant hydrocarbon and it is more likely
Although the Well-5 has the longest logged thickness of all wells, it that the water from deep reservoirs in the Rovuma Basin is fresh.
has poor reservoirs as can be observed in Fig. 14. The DA indicated the The two methods converged, both in the lithology and fluid

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

Fig. 14. Lithology and fluid contacts in the Well-5: (a) lithological profile, (b) net reservoir, (c) fluid contact by the discriminant analysis and (d) fluid contact by the
cluster analysis.

classification in most cases, although some differences occurred in construction of the discriminant functions in each classification problem
defining the fluid contacts, what is a reasonable situation since a contact were sufficiently good.
between fluids in reservoirs, is actually a transition zone and not a linear Regarding the cluster analysis, it is important to keep in mind the
separation as it may seem. existence of ambiguities because the method itself does not assign the
In lithological discrimination, the GR log showed the highest relative group categories. It always divides the input data into the predefined
contribution as expected. For water-hydrocarbon discrimination, the number of the possible groups, even when actually it consists of only one
neutron porosity and density logs had been more important. And in cluster. Thus, the combination of the two methods (DA and CA) is
determining gas-oil contact, the neutron porosity log played an impor­ important.
tant role.
The discriminant functions and scores obtained from the reference
well showed good performance on their application both for lithology Declaration of competing interest
and for fluid classification in the remaining wells of the same strati­
graphic unit, what validates their efficiency in the discriminatory pro­ The authors declare that they have no known competing financial
cess. Besides, this also shows that the set of variables selected for the interests or personal relationships that could have appeared to influence
the work reported in this paper.

J.L. Medja Ussalu and A. Bassrei Journal of African Earth Sciences 185 (2022) 104419

