Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Construction and Building Materials 163 (2018) 343–359

Contents lists available at ScienceDirect

Construction and Building Materials

journal homepage:

Evaluation of statistical parameters of concrete strength from secondary

experimental test data
Pietro Croce a, Francesca Marsili a,b,⇑, Frank Klawonn c,d, Paolo Formichi a, Filippo Landi a,e
Department of Civil and Industrial Engineering, University of Pisa, Largo Lucio Lazzarino 2, 56126 Pisa, Italy
iBMB/MPA, TU Braunschweig, Beethovenstraße 52, 38106 Braunschweig, Germany
Department of Computer Science, Ostfalia University of Applied Sciences, Salzdahlumer Str. 46/48, 38302 Wolfenbüttel, Germany
Biostatistics, Helmholtz Centre for Infection Research, Inhoffenstraße 7, 38124 Braunschweig, Germany
Department of Scientific Computing, TU Braunschweig, Mühlenpfordtstrasse 23, D-38106 Braunschweig, Germany

h i g h l i g h t s

 An innovative method for the evaluation of statistical parameters of material mechanical properties is presented.
 Knowledge of statistical parameters is essential in the reliability assessment of existing structures.
 The method is based on cluster analysis of secondary databases.
 The method allows to identify concrete strength classes.

a r t i c l e i n f o a b s t r a c t

Article history: The results of material acceptance tests or in-situ tests are a valuable source of information for reliability
Received 23 February 2017 assessment of existing structures. Huge secondary databases of test results are usually available, coming
Received in revised form 31 October 2017 from different sources, but individual results are often not associated to a given population, so their sta-
Accepted 2 November 2017
tistical analysis is a complicated process. In order to solve this problem, the paper presents a methodol-
ogy that allows to identify homogeneous populations (or material classes), together with their statistical
parameters, when mixed in arbitrary and unknown percentages in a secondary database. The methodol-
ogy is based on the cluster analysis of data applying the Expectation-Maximization algorithm, which
Secondary test data
Cluster analysis
allows to figure out individual classes and their characterizing statistical parameters by fitting a
Mixture models Gaussian Mixture Model. The proposed methodology has been applied to a relevant case study, investi-
EM algorithm gating the cubic concrete strength of the Italian production during the 1960s, also using different
k-Means algorithm approaches. The study demonstrates that approximately six concrete classes can be identified, character-
Concrete ized by an almost constant standard deviation of about 4.0–4.5 MPa, in agreement with the results
Existing structures obtained by previous research. As the results obtained with different approaches agree satisfactorily, it
can be concluded that, if enough experimental data are available, the proposed procedure is not only suit-
able for the intended applications, but it is also ‘‘robust” enough.
Ó 2017 Elsevier Ltd. All rights reserved.

1. Introduction topic can be derived, when available, from standard acceptance

test results or by in-situ investigation.
In the reliability assessment of existing structures the estimate Standard acceptance tests are devoted to assess the mechanical
of the mechanical properties of the building materials and the eval- properties of the material and to determine whether a production
uation of their most relevant statistical parameters is a crucial lot of the material itself is fulfilling the design requirements or not.
issue of the analysis. In some cases, useful information about this This important quality control technique has become more and
more common in the engineering practice starting from the second
half of the 19th century, with different emphasis depending on the
⇑ Corresponding author at: Department of Civil and Industrial Engineering,
building material. Usually material samples are collected at the
University of Pisa, Largo Lucio Lazzarino 2, 56126 Pisa, Italy.
E-mail addresses: (P. Croce),
building site, to assess whether the material can be accepted or
(F. Marsili), (F. Klawonn), not, or on factory, in the framework of conformity control; the
(P. Formichi), (F. Landi).
0950-0618/Ó 2017 Elsevier Ltd. All rights reserved.
344 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

concrete strength distribution can be then filtered due to the rejec- tance classes are often not correctly declared or not declared at all;
tion or acceptance of certain batches, resulting in a positive influ- for instance, use of downgraded material to meet the requirements
ence on the structural reliability of concrete structures [1]. of a lower resistance class is an emblematic example of incorrect
Moreover, often in-situ investigation is frequently needed to sup- declaration. On the contrary, taking into account that frequently
plement or to substitute laboratory tests, that are usually classified the material properties depend on the composition of the material
into destructive [2,3], minor destructive [4], or non destructive [5– itself and on the workmanship, and that these aspects are nearly
11] tests. constant in a homogenous geographical area, it seems reasonable
As soon as the process for sampling and testing specimens has to group the data on the base of regional criteria.
been codified in relevant Codes and Standards, test results have The evaluation of the statistical parameters of the compressive
been stored in laboratory archives and databases; therefore in strength class of the concrete is a key issue in reliability assess-
most of the cases they can be easily retrieved and consulted, ment of existing reinforced concrete structures. The concrete
regardless of whether they pertain to laboratory or in-situ tests. strength, whether it is produced on site or ready mix, is a random
In a very general context, the statistical parameters of the variable, whose statistical parameters can vary over the time, even
mechanical properties can be obtained by: in a single structure. According to [18,19], several factors can affect
the variation of concrete properties, like the size of the job and the
(a) analyzing sets of destructive or semi-destructive in-situ test duration of the contract, the supervision, workmanship and plant
results; or used, the making, curing and testing of the specimens, the varia-
(b) analyzing non-destructive in-situ test results; or tion in successive batches, and the variation in the constituent
(c) prior evidence; or materials. Above all, and especially on site, a relevant source of
(d) suitable combination of the above mentioned information. the variability is due to the fluctuations of water/cement ratio
(w/c), caused by the continuous adjustment of the quantity of
Nevertheless, the practical application of these approaches water added at the mixer in attempting to maintain a good level
could be subjected to strong restrictions, and the dependability of workability and the variation in moisture of the aggregates, as
of their outcomes could be frustrated by several factors: well as the climatic conditions during the preparation and pouring
of concrete.
– destructive or semi-destructive tests are, in most cases, incom- Since the execution of standard acceptance or quality control
patible with the statics and the needs of safeguarding the struc- tests on cubic or cylindrical specimens aimed at assessing the
ture [12,13] or of preserving its cultural value and that, even in material compressive strength is a common practice in reinforced
cases when semi-destructive tests can be really carried out, concrete structures, huge amounts of test results are available,
their number is normally so limited that appropriate statistical often further supplemented by test results on cylindrical speci-
interpretation is very difficult. mens obtained by in-situ core sampling.
– non-destructive tests are broadly correlated with the actual The statistical analysis of these secondary data could shed a
resistance: their use can be useful to support the identification light on the statistics of the mechanical properties of the existing
of reference values of a mechanical property and to assess the concrete structures.
homogeneity of the material, but commonly they give no direct An attempt to perform some kind of statistical elaboration of
information about statistical parameters; massive test results can be found in literature [20], but it was
– prior evidence can be derived directly from specific acceptance unsuccessful, as proven by the unrealistically high values of the
tests performed on the structure, provided that enough sound coefficient of variation (COV) generally resulting from the analysis.
and reliable experimental results are available, but this is a very As better discussed in the following, there have been several
unusual case; or by suitable elaboration of large amounts of sec- attempts to perform some kind of statistical elaboration of massive
ondary data, derived from experimental test results obtained sets of available test results, but they were unsuccessful, as proven
elsewhere; by the unrealistically high values of the COV generally resulting
– the best way to estimate statistical parameters of material from the analysis. Actually, since a broad secondary database of
properties is to combine prior evidence, when available, with compressive strength tests contains results pertaining to different
semi-destructive or non-destructive investigations; by using concrete classes, which are different statistical populations, the
spot-check results essentially to support the identification of identification of distinct concrete classes, representing homoge-
material properties based on prior evidence or, when possible, nous statistical populations, is a precondition for a suitable statis-
as basis for application of Bayesian updating [14,15]. tical analysis. In performing the analysis it must be duly taken into
account that preliminary manipulations of the recorded data, like
In the last decades, several studies have focused the attention ‘‘a priori” assignment of some specimen to a given class on the base
on approaches (a), (b) and (d), proposing less invasive and more of information recorded on the test report or on the base of engi-
precise testing techniques as well as advanced procedures for com- neering judgments, should be avoided as this kind of information
bining different type of information [16,17]. Despite several pro- is often unreliable or incomplete and anyhow extremely
gresses, the proposed approaches are not always workable in subjective.
practice. On the other hand, the digitizing process has made sec- It must be underlined that existence of a certain number of dis-
ondary test results increasingly available, but a proper methodol- tinct material classes in general, and of concrete classes in partic-
ogy for the statistical analysis and interpretation of secondary ular, can be interpreted as an intrinsic feature of the material
data is still lacking. making process, independently on its codification; the unique dif-
The statistical analysis of secondary material properties data- ference is that in Codes and Standards notional resistance classes
bases is often hindered by the difficulty of identifying in them dif- are defined, to which reference is made both for commercial or
ferent resistance classes and then their statistical parameters, since design purposes.
each individual experimental result belonging to the database can- In effect, also in a first stage and in absence of any standardiza-
not be referred to a specific resistance class of the material. This tion, building materials are produced fulfilling some mechanical
observation, which is quite obvious when in-situ tests are the requirements, suitably adapting the production and the mix design
exclusive source of information, is valid in a much wider sense, on the base of the past experience and of the raw materials locally
because, even in case standard test results are available, the resis- available, according to the know-how developed in limited con-
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 345

text. Since resistance classes, even not standardized, are nothing as for supporting the implementation of mathematical algorithms
more than sets of required mechanical properties, their existence and the evaluation of the obtained results, also resorting to sensi-
is a natural consequence of the production. tivity studies.
Aim of the present paper is to illustrate an ‘‘objective” method
for the identification of concrete classes and the evaluation of their
2.2. Collection of data
most relevant statistical parameters: namely, mean value, stan-
dard deviation and COV, starting from a database where secondary
As already stated, the database to be analyzed should mainly
data, consisting in compressive strength results obtained on spec-
consist of test results obtained on standardized specimens ade-
imens belonging to different concrete classes, are roughly
quately representative of the building material of the structure to
be assessed. Although the existence and the accessibility of such
The basic idea of the method is to partition the mechanical test
databases are not trivial, in the last decades an increasing number
results by means of a cluster analysis based on Gaussian Mixture
of studies have been focused on the collection and organisation of
Models (GMM), in such a way that homogenous statistical popula-
data regarding not only concrete resistances [20–22] but also other
tions can be identified. The method, which is very general and
properties of structural material [23–25]. These researches fit into
applicable to any building material, is also improved with supple-
the current trend of applying data mining techniques for the defi-
mentary criteria of acceptance, based on previous knowledge.
nition of in-service inspection, monitoring programs and mainte-
The mixture model, in which each cluster is defined by an
nance strategies.
appropriate probability distribution function (pdf) of the strength,
To be adequately representative, tested specimens should be
provides results which can be directly used for the reliability
consistent with the investigated structural material in terms of
assessment of existing buildings. Once those clusters, and thus
raw materials and workmanship; therefore they should refer to
material classes, have been identified, it is also possible to deter-
structures coeval with the considered one and belonging to the
mine the uncertainty affecting the statistical parameters.
same geographical region or even to the structure itself.
The proposed procedure, that is explained in general terms in
Once the above mentioned conditions are satisfied, each single
the following, is also applied to solve a relevant case study, con-
datum can be the result of both standard acceptance material tests
cerning the identification of homogeneous resistance classes in
carried out on available samples collected at building sites and in-
the Italian concrete production during the 1960s, and the evalua-
situ sampling and testing on similar structures.
tion of their relevant statistical parameters.
Cleary, being filed in laboratory archives or electronic data-
It must be stressed that, allowing to consider large amounts of
bases, groups of results of standardized tests can be easily collected
experimental data, the method offers the opportunity to evaluate
and combined even if obtained by different laboratories, in such a
the COV associated to each concrete class, which cannot be reliably
way that large sets of data can be elaborated. At the same time, the
obtained with the usual approaches, commonly based on a limited
outcomes of statistical analysis carried out on only smaller parts of
number of experimental results.
data or sub-database can be easily generalised and supplemented,
also aiming to determine how relevant parameters vary from year
2. General methodology to year.
One relevant issue to be tackled in the analysis concerns the
The proposed procedure, which is very general, is based on the minimum amount of data needed for statistical elaboration. This
GMM, as illustrated below. The procedure can be applied whenever minimum is influenced by several factors and cannot be ‘a priori’
uncertainties regarding material classes or other relevant mechan- determined, but it should cover a suitable time interval, typically
ical properties and their statistical parameters exist, provided that some months or one year, such as the assumption that each resis-
enough data can be collected. A fundamental assumption of the tance class represents a homogeneous population, whose statisti-
present proposal is that the pdf of the relevant property under cal parameters are independent on time, is justified.
examination can be approximated by a Normal distribution, even The amount of data to be collected can not a priori determined,
if it could be easily extended and generalized. but it should be rather properly balanced to effectively pursue the
aim of the investigation, also according to researcher’s experience
and engineering judgments, and considering if the outcomes of the
2.1. Historical research and literature review
analysis are consistent with the historical research and the litera-
ture review.
First of all, a historical research should be carried out, with the
However, it should be underlined that:
objective of improving the knowledge about the range of variation
of the material property to be identified as well as of its relevant
– consideration of insufficient number of data could imply that
statistical parameters. In this phase, studies should encompass
some clusters, and then some resistance classes, are disre-
not only guidelines but also scientific and historic documentation
garded, and/or that the statistical parameters of some class
issued at the time when the structure was built, following their
are not correctly assessed;
evolution over the years, up to the more recent investigations.
– conversely, consideration of too many data could complicate
Other sources of important information might be past Codes and
the identification of the various clusters;
Standards dealing with building materials, as well as treatises,
– If the quantity of data is big enough, comparing the results
manuals and technical literature summarizing in written form
obtained on different subsets it would be possible to check
the existing empirical knowledge about the building practice. If
the validity of the outcomes, as well as to assess time trends,
the material of interest was produced and only later transported
provided that subsets refer to different periods.
on the building site, it is also possible, sometimes, to make refer-
ence to guidelines released by the producer company.
This historical research is also aimed at understanding the 2.3. Cluster analysis based on Gaussian mixture models
physical reasons behind the existence and number of material
classes, typical values and expected general trends of their statisti- 2.3.1. The standard mixture model clusters
cal parameters; in fact, it will serve as a basis for checking the Let y1 ; . . . ; yk the realized values of k independent and identi-
soundness and the appropriateness of assumed hypotheses, as well cally distributed random vectors ðY 1 ; . . . ; Y k Þ. A finite mixture
346 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

model with k components is the distribution F(y) having the fol- But, when the individual components are not sufficiently dis-
lowing density [26]: tant each other, the mixture model could be characterized by a
number of modes lower than k and the data set could look even
f ðyÞ ¼ pi f i ðyÞ ð1Þ unimodal. Furthermore, when the MM is used for providing model
i¼1 based clustering, the choice of the number of distributions k should
be made following the Occam’s Razor principle, which states that if
where fi(y) are the component densities of the mixture and pi are multiple models fit equally well a set of data, the simplest one
the mixing proportions: should be chosen.
0 < pi < 1; i ¼ 1; . . . ; k ð2Þ In practice this proposition can be encoded using log-likelihood
criteria, also called information criteria.
k The most popular criteria for taking a decision regarding the
pi ¼ 1: ð3Þ number of components in MM are the Akaike Information Criterion
i¼1 (AIC),
Mixture models can be used for twofold purposes: modelling
situations in which a single parametric family is unable to provide AIC ¼ 2 lnðb
LÞ þ 2p; ð5Þ
a satisfactory model and/or k distinct groups are known a priori to
exist in some physical sense. and the Bayesian Information Criterion (BIC),
Taking into account that, as already said, materials can be clas-
sified according several distinct resistance classes, for the aim of BIC ¼ 2 lnðb
LÞ þ p lnðiÞ ð6Þ
the present paper the mixed model is used according to the second
purpose, for providing model based clustering. where the first term, comprising the maximized value of the likeli-
hood function b L, represents the lack of harmonization, while the
2.3.2. Fitting a mixture model second term, comprising the number of parameters p, is the penalty
A MM can be easily fit to a group of data belonging to k different term depending on the complexity of the model; in case of BIC the
populations normally distributed if the population to which each second term also comprises the number of data i.
datum belongs or, equivalently, the statistical parameters of the Despite similar in their expression, AIC and BIC have a deeply
probability distribution function of each population are known. different meaning and depending on the context in which they
Since these details are usually unknown, in order to fit the model are applied, one should be preferred to the other.
an iterative procedure which maximizes the Log-likelihood of the As clarified by [29], ‘‘there are two cultures in the use of statistical
data, called Expectation-Maximization (EM) algorithm, could be modeling to reach conclusions about data. One assumes the data are
applied. In this work, the cluster analysis based on Gaussian Mix- generated by a given stochastic data model. The other uses algorithmic
ture Model has been carried out applying the MATLAB function models and treats the data mechanism as unknown”. The model that, which in turn apply the EM algorithm. A best fits the data must be interpreted differently in the two above
detailed explanation of the EM algorithm can be found in [26]. mentioned worlds [30]. In the first case, as sample size increases, it
In order to obtain sound estimation of the statistical parameters is expected to find the correct model; it is thus a matter of confir-
of each cluster, the initial values of the EM algorithm should be mation or falsification of the considered models. In the second
suitably chosen. In effect, it has been shown in [27,28] that differ- case, the true model cannot be found; it is thus a matter of select-
ent starting strategies and stopping rules lead to quite different ing the model that maximizes predictive accuracy. Usually, since
results. In fact, inappropriate choice of the starting values of the AIC has the property of efficiency, it outperforms BIC in the latter
algorithm implies twofold drawbacks: the slowness of the conver- situation; on the other side, since BIC has the property of consis-
gence usually affecting the procedure is aggravated on one hand, tency, it outperforms AIC in the former circumstance.
and, if the likelihood function is unbounded on the boundaries of Furthermore, the above mentioned criteria can be justified on
the parameter’s space and the initial values are chosen on the theoretical basis, arguing that the model with the minimum AIC
boundary, the sequence of estimates may diverge, on the other value should be asymptotically the closest to the true model,
hand. according to Kullback-Leibler distance [31], while the model with
A common situation is that the likelihood has multiple roots the smaller BIC value should be the one with the highest posterior
corresponding to local maxima: in this case, the algorithm should probability.
be applied for a wide choice of initial values in any search of local Finally, several researches have pointed out that AIC tends to
maxima. A possible option is to apply the EM algorithm for a num- overestimate the number of components, in fact, for i > 8 and
ber of random starts. lnðiÞ > 2, BIC penalizes complex models more heavily than the
For a mixture model characterized by k components with mean AIC, so reducing the risk that too many components are fitted.
li , it is possible to randomly generate the lð0Þ
i as In the light of the above remarks, the choice of the model selec-
tion tool has to be made considering the peculiarities of the situa-
1 ; . . . ; lk  Nðy; VÞ
 ð4Þ tion, especially considering the process generating data and the
 and V are respectively the mean and variance from the size of the sample [32].
where y
Assuming that BIC min and AIC min are the minimum values of BIC
and AIC, and that BIC i and AIC i are the values related to the ith
The algorithm fits an MM assuming that the number of distribu-
model, the models have been ranked according to the following
tions k is known. However, this is not always the case; a method to
identify the number of components k is discussed in the following
DBIC;i ¼ BIC i  BIC min and DAIC;i ¼ AIC i  AIC min ð7Þ
2.3.3. Information criteria As stated in [33,34], when DAIC;i 6 2 the evidence of the ith
If no prior information about the number of clusters is available, model compared with the model with the minimum value of the
and the mixture model is multi-modal, k could be assumed, as a information criterion is weak and thus it cannot be rejected.
first attempt, equal to the number of modes. Assuming that further data cannot be collected, it is possible to cal-
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 347

culate a weighted average of all the models that cannot be rejected, obtained analyzing separately each data subset should provide
being the weight wi computed according to the values of DAIC;i by arguments to validate the results or to ask for additional
means of investigation.
wi ¼ exp  ð8Þ 3. Definition of the statistical parameters of concrete resistance
classes produced in Italy in the 1960s
which basically represents the relative likelihood of the ith model.
The application of the proposed methodology is illustrated in
2.4. Identification of material classes detail analyzing a relevant case study, devoted to assess the statis-
tical properties of the concrete produced in Italy during the 1960s.
Once the cluster analysis based on GMM has been completed, it The case study is very important in view of reliability analysis of
would be possible to identify material classes and the related sta- existing Italian reinforced concrete structures and for planning of
tistical parameters. strengthening interventions, because a significant quota of them,
In some lucky cases, it will be possible to directly define mate- approximately 20%, were built in that decade and a relevant part
rial classes from the mean value and the standard deviation of each needs nowadays to be refurbished.
identified clusters. Actually, results obtained so far are generally At that time, the concrete was mostly prepared by in-situ mix-
not so evident, and further studies are required, but this aspect is ing, often following some empiric volume metered reference
not particularly relevant, being the standard deviation and the recipes, while the use of ready-mix concrete, weighed metered,
COV the main parameters sought. was limited to the most relevant structures.
In any case, clusters consisting of less than 100 data points and In addition to point out again that the mechanical properties
clusters whose statistical parameters cannot be assessed with pre- and the mix design declared in the test report are often missing
cision, or even characterized by unrealistic values of the COV, or incorrect, it must be stressed that mechanical properties of
should be discarded. site-mix concrete were hardly predetermined, because the use of
Moreover, the presence of some outliers, resulting from very reference recipes could not take account that concrete properties
low or very high values of the investigated property, could strongly depend not only on the recipe itself, but also on the type of cement
influence identification of the extreme clusters. This eventuality is and on the aggregates used for the preparation, as well as on the
clearly emphasized when extreme clusters are characterized by compaction degree. These remarks confirm that the procedure
coefficient of variation much higher than expected; in this case should disregard this kind of information and that resistance
too, extreme clusters should be discarded. classes should be determined simply seeking for individual clusters
Assuming that parameters that mainly identifies a material in the resistance database, in such a way that samples belonging to
class are the mean value and the COV of the fitted Gaussian distri- a given class are recognized on the base of their properties and not
bution, a second cluster analysis could be performed considering on preconceptions.
the statistical parameters obtained by fitting the GMM and imple-
menting a k-mean algorithm. In this way, relevant material classes 3.1. Historical research and literature review
could be better identified, although supplementary investigation
might be needed to assess the best k value. 3.1.1. The coefficient of variation of concrete strength
It must be underlined that the proposed method is a blind pro- As already stated, additional criteria for the acceptance of the
cedure not requiring particular assumptions; consequently the outcomes of the proposed procedure could be introduced, in par-
identification of material classes is markedly objective. ticular based on the value of the COV. Relevant literature has been
thus preliminary examined to assess some reference value for COV.
2.5. Determination of the uncertainty affecting the statistical As it has already been remarked in §1, the water/cement ratio is
parameters basically considered the most influencing single factor in the
strength of fully compacted concrete [35]. Several empirical rela-
The statistical parameters characterizing each material prop- tionship between concrete strength and w/c have been proposed;
erty are random variables affected by aleatoric uncertainty. The for example, some references can be found in [35,36], where no
inherent randomness can be due to several factors; but a particu- specific information is given about the influence of w/c on COV.
larly significant and leading aspect might be represented by Moreover, Erntroy [19] proposes to refer to the w/c ratio curve
quasi-periodic variations over time affecting the quality of the applied to the minimum strength rather than to the mean value,
material production process, and then the statistical properties of applying the control function to the w/c ratio rather than the
the batch of samples tested in laboratory. strength. But the concrete class cannot be automatically derived
To check how the main parameters of each material class vary from w/c, even when it is perfectly known; in fact, the mix design
from year to year, databases covering a sufficiently protracted per- aiming to achieve a given concrete class operates on all relevant
iod of time, can be subdivided in subsets covering a suitable num- factors and not only on w/c. In this respect, it should also recalled
ber of time intervals, each one satisfying a minimum amount of that several types of cement exist and that even the range of vari-
data requirement. Since the results obtained by fitting the MM of ation of standard compressive strength of cement can be large; for
each data subset can be inferred as realizations of the statistical example, for cements belonging to both classes 32.5 and 42.5 Euro-
parameters characterizing each investigated mechanical property, pean Standards [37,38] prescribe a range of variation of 20 MPa.
appropriate elaboration of these data should allow to assess the In several works [14,39–41] the distribution of strength of test
inherent aleatoric uncertainty of the mechanical property itself. specimens is assumed to be Gaussian, and thus can be described by
Adopting a value for k matching the previous analysis, a new the mean value and the standard deviation.
cluster analysis, based on a GMM characterized by k components, For practical purposes, the assumption of normal distribution is
could be performed considering the above mentioned database in acceptable, although examples of skewness have been reported by
order to obtain additional information. Finally, critical comparison [42], for low strength concrete, and by [43] and also in [44] for high
of the statistical parameters of the fitted pdfs with those previously strength concrete. Usually, the assumption of normal distribution
348 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

errs on the safe side with respect to the number of test results 3.1.2. The production of concrete in Italy during the ‘60s
expected to fall below the specified value of strength. Erntroy The case study refers to the Italian concrete production
[19] noticed a slight skewness of the distribution for low strength in1960s; therefore it would be very helpful to summarize the main
and high strength concretes and longer tail for medium strength features of the concrete industry in Italy at that time.
concretes, but in his opinion departure for normality does not have As recalled in [56], after the 2nd World War the Italian cement
great effect, unless stricter minima are specified for low or high industry developed at an appreciable rate, adequate to the rapid
strength concretes, because in these cases, adoption of a normal economic growth of the Country during the so-called ‘‘Economic
distribution may result in a bad fit. A log-normal distribution boom”. In effect, during the 1960s, Italy became the 1st in Europe
might be more appropriate here [45], but, as known, when COV for concrete production and among the first worldwide for con-
is below 20%, like for concrete strength, the characteristic crete technology, also due to the huge number of reinforced con-
strengths, defined as 5% lower fractile, determined considering crete structures built in that period.
normal distribution or log-normal distribution are very close The control of the total amount of water, with the aim of main-
together. taining w/c constant in order to maintain prescribed quality, was
Concerning COV, it is still unclear if concrete classes with sensibly improved in concrete batching plants, where mixers were
increasing strength are characterized by constant standard devia- equipped with special devices for controlling the quantity of water
tion, constant COV, or none of the two. Moreover, this issue is dee- fed called ‘concrete hygrometers’ [57]. Nonetheless, the old system
ply related to acceptance criteria of experimental results and in of mix-water control was still widely applied, consisting of a suit-
particular on the relationship between the mean and the minimum able regulation of f c the water fed into the mixer, on the basis of
experimental strength adopted in the acceptance criteria. Of measured moisture content of aggregates, but resulting in a wider
course, if COV is assumed independent of the concrete strength, scattering of the concrete strength.
the ratio between the mean and the minimum value is constant; On the other hand, in most cases concrete was still directly pro-
if the standard deviation is assumed independent on the concrete duced on the building site on hand-made basis. In this condition, to
strength, the difference between the mean and the minimum value obtain concrete with the required properties, the concrete pro-
is constant. ducer, who was often a construction worker, mixed cement, water,
Studies carried out in 1973 by [18] on ready-mix and site-mix aggregates according to empirical recipes, based on his experience
concrete productions concluded that COV is independent of the and on the raw materials locally available, so determining
strength, as confirmed by some laboratory studies; while most unavoidably large scattering of the hardened concrete properties.
recent studies related to site conditions demonstrated strength The Italian Code governing all the activities connected to the
independence of the standard deviation [35], even if the question building industry in force at that time was still the one specified
is still discussed. in [58].
In a past version of the Swiss Standards, characteristic values Concerning the acceptance tests on concrete, sampling of four
were defined considering a COV of about 20%, independent on standard cubes was required every 500 cubic meters of concrete
the strength, while in more recent versions [46], a constant value casting. The standard cubes, having a side of 160 mm or 200 mm
of the standard deviation is assumed. depending on the maximum size of aggregates (smaller or > 30
Some authors came to the conclusion that the standard devia- mm respectively), were tested after 28 days of curing, ordering
tion is strength independent at levels exceeding a certain strength. the results in decreasing order, from r1 to r4 , excluding the minor
According to [47,48], the standard deviation increases linearly until outcome r4 and determining the average cubic resistance:
resistance of about 20 MPa, then it remains constant irrespective of
the strength; for [18,19,49,50] the relationship between the stan- r1 þ r2 þ r3
dard deviation and strength can best be represented by a smooth fc ¼ : ð9Þ
curve through the origin tending to become a horizontal straight
line for values of mean strength bigger than 28 MPa. The test was considered successful if both the following condi-
Soroka [51] came to the conclusion that both standard deviation tions were verified:
and COV depend on strength, being the standard deviation less 
sensitive to strength variation than COV. The overall average stan- fc P fc
dard deviation found in [51] is around 6.0 MPa, in agreement with f c P 12 MPa
the data published in [19], where values between 5.9 and 6.2 MPa
are derived for fair control conditions, and in [49], where values being f c the prescribed cubic strength.
between 5.0 and 6.0 MPa are deduced. However, in some cases For each test the following data were recorded: date of sam-
relationships are proposed based on different conclusions, assum- pling; date of testing; cube dimensions; effective area of the cross
ing the standard deviation depending on strength or even on actual section of the specimen; ultimate load and compressive strength.
site conditions. Other data were possibly recorded such as type, origin and amount
Following [52], the COV is independent of the concrete strength of cement and aggregates; building site; structural element from
but dependent on the degree of control exerted over the produc- which cubes were sampled; the applicant of the test and so on. It
tion process. Moreover, as the control conditions were not exactly must be highlighted that on the form to request tests on cubes
the same in all sites covered by this study, it might be argued that there was no information regarding the strength adopted in the
evaluation of the same data by a method more sensitive to the design or the concrete class, even because in Italy formal reference
actual level of quality control, would have led to a different to design classes and characteristic values was introduced succes-
conclusion. sively, in 1972, by the decree [59], when the concrete was a stan-
In general, the findings of most of the above mentioned studies dardized and more industrial material, produced in concrete plants
lead to the conclusion that standard deviation is independent on and later delivered on site.
the concrete class, as it is implicitly assumed in recent European According to the literature review summarized above, it has
standards, like, for example Eurocode [53], as well as in fip Model been possible to speculate over the number of concrete classes
Code [54] and Italian Standard [55] for quality control purposes. and their statistical parameters. The most influential Italian refer-
Moreover, the outcomes of the case study illustrated in the follow- ence at that time was probably the already cited in [36], because
ing corroborate this conclusion. it was a widely known and implemented practical manual among
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 349

both engineers and concrete producers. In that book, at least 7 – the histogram related to 1965 (Fig. 3) allows a very easy identi-
classes were indicated, just as many w/c, according to which it fication of sub-models, as 8 or 9 clusters of data can be recog-
was possible to obtain a given resistance, ranging from 17 to 50 nized by visual check;
MPa. Concretes characterized by a very poor quality, to be used – the histograms related to 1967 (Fig. 4) and 1969 (Fig. 5) are
in non-structural elements, and barely satisfying the condition much smoother; but in spite of this, they clearly reveal two
expressed in Eq. (10), could envisage a very low class; while high modes in the range 20–40 MPa;
strength concrete, typically employed in pre-stressed concrete – the global histogram in Fig. 6 shows less marked modes than
structures, could ask for definition of further class. individual histograms, therefore it has been used only in a sub-
sequent phase to validate or to accept the results derived ana-
3.2. The collection of data lyzing the individual histograms.

The laboratory that has been considered in the case study is the A number of 7  10 concrete classes, each one characterized by
Laboratory of the Department of Civil and Industrial Engineering of its own pdf, could be expected accordingly. The modes and the
University of Pisa (I), that was recognized as an official laboratory mean values of two adjacent distributions should be approxi-
for material testing since 1939 [58]. Data were collected consider- mately equally spaced, while the distance between their character-
ing cubic strength deduced from official test reports issued in istic values, or more generally between their x-fractiles, depends
1960s. on their standard deviations. Obviously, approximately uniformly
In a first stage, it was analyzed only the year 1965, for which spaced x-fractiles indicate that standard deviation is independent
3725 results of single compressive tests were available. The pre- of the concrete class.
liminary cluster analysis performed on them, whose results have
been already published in [60], shows that, as expected, the mix-
3.4. Cluster analysis based on Gaussian mixture models
ture model is very promising, so encouraging further investigation.
The analysis was then considerably widened, taking also into
The available data have been analyzed with the EM algorithm
account test results available for the years 1961 (3379), 1963
considering a number of components, k, variable from 7 to 10.
(4816), 1967 (6302) and 1969 (6221), in order to have representa-
For each assumed value of the number of components k, the
tive data samples covering the whole decade.
algorithm has been run considering different initial values.
The huge number of test results (18222 in total) embraces all
More precisely, for each ith component, i ¼ 1; . . . ; k, the starting
relevant structural typologies, such as residential and public build-
point li
ings, industrial structures, roads, bridges, hydraulic structures and has been calculated performing a Monte Carlo simulation
in plausible ranges: 1000 samples li were drawn from a uniform
distribution Uðxi;min ; xi;max Þ in the interval ðxi;min ; xi;max Þ, hypothesiz-
3.3. Preliminary analysis of the histograms ing that

The compressive cubic strength results are plotted in the rela- xi;max  xi;min ¼ 5 MPa ð11Þ
tive frequency histograms illustrated in Figs. 1-5, referring to the
years 1961, 1963, 1965, 1967 and 1969, respectively, while all and
available data are grouped in the histogram in Fig. 6.
xiþ1;min  xi;max ¼ 5 MPa ð12Þ
Observing the histograms, it is possible to draw some prelimi-
nary conclusions, in particular about the existence and the number
and setting the left bound of the first interval, x1;min , to the values
of classes to be considered in the elaboration:
2.5 MPa, 5.0 MPa or 10.0 MPa, as appropriate.
For k ¼ 7, also the case
– most of the histograms, in particular the one referred to 1961
(Fig. 1), shows a long right tail, in which sub-models are clearly xi;max  xi;min ¼ 10 MPa ð13Þ
– the histogram related to 1963 (Fig. 2) appears almost symmet- has been considered, in fact, in this case, wider intervals were nec-
rical, but a more deep look reveals at least 7 clusters; essary to cover the total range of the whole sample.

Fig. 1. Histogram of results about compressive cubic strength of concrete – year 1961.
350 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

Fig. 2. Histogram of the compressive cubic strength of concrete results – year 1963.

Fig. 3. Histogram of the compressive cubic strength of concrete results – year 1965.

Fig. 4. Histogram of the compressive cubic strength of concrete results – year 1967.

5 MPa < liþ1  li

ð0Þ ð0Þ
The intervals which have been considered in cases k ¼ 8 and < 15 MPa ð14Þ
k ¼ 9 for sampling the starting values, are shown in Figs. 7 and 8,
reasonably implying that the distance between the mean values of
Eqs. (11) and (12) lead to the following conditions,
two adjacent clusters varies in the range 5–15 MPa.
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 351

Fig. 5. Histogram of results about compressive cubic strength of concrete – year 1967.

Fig. 6. Global histogram of the compressive cubic strength of concrete results for years 1961, 1963, 1965, 1967 and 1969.

Fig. 7. Intervals considered for sampling starting values (k = 8).

Fig. 8. Intervals considered for sampling starting values (k = 9).

352 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

Running the EM algorithm according to the above described The statistical parameters, mean value and COV, of each compo-
procedure, it emerged that, after a few runs, in cases k ¼ 7 and nent of the annual distributions as well as the percentage of data
k ¼ 10 the covariance matrix became ill conditioned, because at belonging to each cluster have been summarized in Table 1. As dis-
least one component was characterized by not enough observa- cussed before, clusters have been discarded when characterized by
tions with significant weights. On the contrary, for k ¼ 8 and too low or too high unrealistic values of COV or when characterized
k ¼ 9 the algorithm ran correctly, providing reasonable results. by inadequate number of elements. Therefore, clusters nr. 1 and nr.
At the end of each run AIC and BIC values were computed, so 8 have been always discarded, as well as clusters nr. 7 referring to
that the obtained models were ranked according to DAIC;i and years 1961 and 1963.
DBIC;i , for both k ¼ 8 and k ¼ 9. One important observation clearly emerges looking at the val-
Taking into account all the models that could not be rejected, ues of the COVs of the models fitted on yearly data. COVs range
AIC and BIC values were evaluated on the weighted model, averag- in the interval 0.06–0.25, but when the mean value of the cubic
ing the statistical parameters, strength is bigger than 20.0 MPa, COV is generally lower than
0.18. High values of COV are typical for low resistance concretes;
w1 l1 þ w2 l2 þ . . . þ wn ln
l ¼ ð15Þ as soon as the concrete strength increases COV tends to reduce
w1 þ w2 þ . . . þ wn so that for high strength concrete it results around 0.07–0.09.
On the contrary, standard deviation remains nearly constant
w1 ðx1  l1 Þ2 þ w2 ðx2  l2 Þ2 þ . . . þ wn ðxn  ln Þ2
r 2 ¼ ð16Þ and k independent of the concrete strength, being around 4.0–
w1 þ w2 þ . . . þ wn 4.5 MPa.
where the weights wi are given by Eq. (8). As already explained, Besides it must be underlined that clusters identified in differ-
when DAIC;i 6 2 the ith model cannot be rejected; in fact, as for ent annual databases are characterized by comparable statistical
properties and that, in particular, clusters characterized by compa-
i ¼ n it results in DAIC;n  2, the minimum meaningful value is
rable resistances have similar COVs.
wn ¼ expð1Þ  0:368.
These results confirm that the proposed procedure is able to
In order to identify the number of components, the averaged
identify properly homogenous populations of test results, even if
models were compared in terms of AIC and BIC. It emerged that
the origin of individual data is unknown.
lower values of BIC were obtained for k ¼ 8, while the AIC values
Anyhow, looking at the identified clusters, it appears difficult to
resulted very close together, as the difference was less than 2.0.
directly draw some conclusion about the concrete classes. For this
These results are coherent with the above discussion about the cri-
reason a further analysis based on k-means has been carried out,
teria, according to which BIC tends to penalize more complex
considering the statistical parameters (l and COV) of the fitted
Gaussian distributions.
Plotting the pdfs revealed that, in the range 20 < f c < 50 MPa,
clusters identified considering eight or nine components basically
coincided; fitting nine components, an additional cluster was 3.5. Identification of material classes through k-means algorithm
obtained, usually for very high or very low resistances, which
was outside the field of interest, as characterized by large scatter- As each material classes can be represented in terms of mean
ing and including a limited amount of data. For this reasons, the value and standard deviation, a second cluster analysis was per-
final decision was to consider k ¼ 8 components, disregarding formed, based on the so-called k-means algorithm, in order to
the extreme clusters, influenced by outliers, as well as any other identify more precisely concrete classes.
clusters comprising less than 100 data. The k-means clustering is a method where a group of n observa-
Figs. 9–13 display the histograms as well as the best MMs tions is partitioned into k clusters in such a way that each observa-
obtained considering eight or nine components, for each annual tion belongs to the cluster with the nearest mean value. In other
database. words, the mean of each cluster, that here is especially called pro-
In the figures, which are sorted in chronological order, the clus- totype, should be chosen such that the variance of all the clusters is
ters identified by the eight component MMs are plotted with solid minimized, thereby seeking for compact groups of observations.
red lines, while the clusters identified by the nine component MMs Assuming that the space of the prototypes M is identical to the
are plotted with dotted blue lines. data space X ¼ Rm and that some suitable metrics d,

Fig. 9. Histogram of test results and MMs for k = 8 (solid red lines) and k = 9 (dotted blue lines) – year 1961. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 353

Fig. 10. Histogram of test results and MMs for k = 8 (solid red lines) and k = 9 (dotted blue lines) – year 1963. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)

Fig. 11. Histogram of test results and MMs for k = 8 (solid red lines) and k = 9 (dotted blue lines) – year 1965. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)

Fig. 12. Histogram of test results and MMs for k = 8 (solid red lines) and k = 9 (dotted blue lines) – year 1967. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
354 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

Fig. 13. Histogram of test results and MMs for k = 8 (solid red lines) and k = 9 (dotted blue lines) – year 1969. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)

Table 1
Statistical parameters of the MM components and percentage of data belonging to each cluster for k = 8.

Component nr.
Year Parameters of the annual 1 2 3 4 5 6 7 8
1961 l (MPa) 12.5 20.6 30.0 38.1 45.4 56.4 67.9 76.8
COV 0.214 0.183 0.114 0.077 0.087 0.074 0.041 0.149
% of data belonging to the cluster 7.2 20.4 22.7 14.5 11.6 9.1 2.0 12.5
1963 l (MPa) 13.7 22.2 30.8 40.8 52.2 61.6 70.1 70.2
COV 0.303 0.160 0.134 0.094 0.089 0.057 0.026 0.143
% of data belonging to the cluster 5.9 11.1 19.7 24.2 21.5 10.3 2.0 5.2
1965 l [MPa] 9.5 15.3 23.7 34.0 46.6 59.7 68.4 86.2
COV 0.160 0.230 0.164 0.134 0.114 0.072 0.090 0.009
% of data belonging to the cluster 0.9 9.4 22.1 32.0 26.0 6.6 2.9 0.2
1967 l [MPa] 5.8 16.3 25.2 35.1 45.7 56.2 64.2 73.7
COV 0.179 0.248 0.172 0.125 0.105 0.092 0.071 0.090
% of data belonging to the cluster 0.3 15.2 24.1 26.3 20.7 8.0 2.8 2.7
1969 l [MPa] 13.1 22.3 31.8 41.8 51.7 60.5 71.3 79.9
COV 0.330 0.187 0.129 0.095 0.092 0.072 0.061 0.090
% of data belonging to the cluster 5.7 16.8 24.5 22.0 15.0 10.7 4.8 0.6

d : M  X ! Rþ , like the Euclidean distance, allowing to directly By looking at the scatter plot, it emerges clearly that at least 2
compare clusters and data is defined, the location of each cluster roughly hyper-spherical clusters could be identified for
can be easily derived, provided that the number of clusters and l < 40 MPa and COV > 0:1; on the other hand, it is much more dif-
the data belonging to each cluster are known. ficult to identify groups in the remaining data, but at least three or
Since data belonging to each cluster are generally not known, it four clusters could be hypothesized. The algorithm was run again
is necessary to setup a proper method to recognize them. considering six or seven clusters, establishing the initial conditions
Let pijj 2 f0; 1g be a binary membership matrix that expresses similarly to the cluster analysis based on GMM described before,
whether the datum xj 2 D belongs to the ith cluster C i , the k- namely considering that clusters’ prototypes should be located at
means clustering starts with some initial guesses of the prototypes regular intervals of around 10 MPa. Anyhow, running the algo-
and successive updating is performed, alternating the adjustment rithm with different initial conditions led to almost the same
of the membership matrix with the adjustment of the prototypes. results, probably in consequence of the limited number of data.
The updating can be carried out according to various algo- By comparing the results, it turns out that the most reasonable
rithms. The k-means algorithm defined before optimizes the fol- partition is obtained considering seven clusters. Clusters repre-
lowing objective function J kM during this iterative refinement: senting groups of data are highlighted in Fig. 14 with different col-
ors, together with their centroids.
k X
Each class could be reasonably inferred as a concrete class,
J kM ¼ jjxj  pi jj2 ; ð17Þ whose average statistical parameters, corresponding to the coordi-
i¼1 x2C i
nates of the centroid of each class, are reported in Table 2; cluster
where pi is the mean value of data in C i . Further details regarding nr. 1 characterized by a very high COV has been disregarded.
other possible algorithms are given in [61]. The results, in terms of mean value and standard deviation of
The application of the k-means algorithm to the case study concrete resistance of each cluster and in terms of COVs, are in line
required to define the number of clusters to be taken into account. with those previously determined. The agreement is even more
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 355

Fig. 14. Cluster assignments and centroids.

Table 2
Average statistical parameters of concrete classes obtained by k-means algorithm.

i EM + k-means results (annual data)

l [MPa] r [MPa] COV
1 15.8 3.8 0.239
2 22.8 4.0 0.174
3 32.4 4.1 0.127
4 40.2 3.6 0.089
5 48.3 4.7 0.097
6 58.9 4.3 0.073
7 67.9 5.0 0.074

significant considering that, as anticipated, rather than the identi-

fication of individual concrete classes, which could be assessed by
Fig. 15. Histogram of the normalized mean value, l00 , of the concrete resistance.
means of simplest methods, the main scope of the analysis is to
ascertain standard deviation and COV and their dependence on
the concrete strength.

3.6. Evaluation of the model uncertainty affecting the statistical


In principle, starting from the partition derived applying the k-

means algorithm, it could also be possible to assess the uncertainty
affecting the statistical parameters characterizing individual resis-
tance classes, that could be summarily represented by the COV of
each of them. But, in the present case study the number of realiza-
tions, namely the number of years considered in the database, is
too low for defining a class-related uncertainty.
Therefore, an overall uncertainty has been defined, which has
been estimated considering all the realizations of the statistical
parameters in their whole, without referring to any particular Fig. 16. Histogram of the normalized standard deviation, r00 , of the concrete
class, so trying to estimate as the fluctuation of the production over
the time, and consequently the variation of the fraction defectives,
could influence the statistical parameters.  a
Once determined the median ~x of the elements belonging to X,
In order to assess the uncertainty, for each nth year taken into
further normalization has been carried out with respect to ~
account, the values of the parameters xi;n of the ith cluster identi-
fied by the k-means algorithm have been firstly normalized to x0i;n x0i;n
with respect to the prototype pi of the cluster itself, x00i;n ¼ ð19Þ
x0i;n ¼ ð18Þ so that a new set X has been obtained, whose elements x00i;n are char-
acterized by a median equal to 1.0.
and, subsequently, all the normalized x0i;n parameters have been The histograms of mean value, l00 , standard deviation, r00 , and
 including all the years.
grouped in a unique set X coefficient of variation, COV 00 , of concrete resistance, as derived
356 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

not be identified simply looking at the plot. Besides, the histogram

is sensibly skew, with a significant right tail.
The EM algorithm was run again considering k ¼ 8 components
and random initial conditions, sampled according to the procedure
explained in §3.4, but considering 5000 simulations.
AIC and BIC values were computed for each fitted GMM, and an
average model was found according to Eqs. (15) and (16). The mix-
ture of Gaussian models so obtained is shown in Fig. 18, where also
the normal distribution (in blue) and the lognormal distribution (in
green) best fitting the global histogram are shown.
The normal distribution is characterized by a mean value
l ¼ 37:86 MPa; a standard deviation r ¼ 16:29 MPa; a
COV ¼ 0:429; while the lognormal distribution is characterized
by a mean value l = 38.25 MPa; a standard deviation
Fig. 17. Histogram of the normalized COV, COV00 , of the concrete resistance. r ¼ 19:12 MPa; and a COV ¼ 0:5, so showing both an unrealisti-
cally high COV. This result is not surprising, because it confirms
once more that analysis of data could not be performed disregard-
ing identification of homogenous populations of data.
Table 3
Statistical parameters of normalized mean value, l00 , standard deviation, r00 , and COV, The statistical parameters characterizing each pdf of the mix-
COV 00 , of concrete resistance. ture model and the percentage of data belonging to each cluster
are reported in Table 4. As previously said, clusters nr. 1 and nr.
l00 r00 COV00
8 are not representative.
1.005 1.010 1.011 The results match one more time with those previously
r 0.056 0.113 0.113

0.056 0.112 0.112
obtained, both considering annual data (§3.4) and k-means algo-
rithm (§3.5); in fact, the standard deviation results practically
independent of the resistance, being mostly in the range 4.0–4.5
MPa, while the COV is decreasing with the resistance in the range
from the analysis of the case study data and normalized according
0.08–0.18 except for concrete characterized by a mean value of
to Eqs. (18) and (19), are illustrated in Figs. 15–17, respectively.
cubic strength less than 20.0 MPa.
The statistical elaboration of these data allowed to evaluate the
statistical parameters, l , r , COV  , concisely reported in Table 3,
clearly showing that the uncertainty of the mean value is repre- 4. Overview of the method
sented by a COV around 0.056, while the model uncertainties of
both the standard deviation and the COV are represented by a The main steps of the method, as outlined in the flowchart illus-
COV around 0.112, which could be then adopted as reference val- trated in Fig. 19, can be summarized as follows:
ues in reliability analysis.
– test results, not necessarily associated to some population or
resistance class, are collected, even coming from several
3.7. Further discussion and validation of the results sources;
– data covering a suitable time interval are suitably split in indi-
In order to further discuss and validate the results, a cluster vidual databases;
analysis based on GMM has been also performed considering the – cluster analyses based on GMM are performed for each data-
global histogram, previously illustrated in Fig. 6. base, so that homogeneous groups of data and their relevant
As underlined before, the histogram appears much smoother statistical parameters are properly identified;
compared to those of the annual results, and sub-models could – if needed, new data are collected and the analyses are repeated;

Fig. 18. Histogram of global test results, MMs for k = 8 (solid red lines) and best fitting Normal and Lognormal pdfs. (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 357

Table 4
Average statistical parameters of concrete classes obtained by GMM on the global histogram.

i EM for the global histogram

l [MPa] r [MPa] COV % of data belonging to the cluster
1 14.7 4.0 0.271 10.07
2 22.9 3.9 0.171 17.79
3 31.8 4.2 0.132 22.95
4 40.6 4.2 0.103 20.05
5 49.9 4.3 0.087 14.09
6 59.9 4.6 0.076 9.27
7 70.2 5.5 0.078 3.40
8 79.5 10.2 0.128 2.38

– cluster analysis based on k-means is carried out, focusing on the

statistical parameters obtained by fitting the Gaussian distribu-
tions, to define some reference classes or to achieve additional
– uncertainties affecting the statistical parameters are defined,
comparing results obtained for each individual database, suit-
ably considering the variations of the statistical parameters
from year to year;
– results are further discussed and validated performing a new
cluster analysis, based on GMM on the whole database, and
final conclusions are drawn.

5. Conclusion

An innovative method has been presented for the statistical

analysis of massive sets of secondary test data, where data regard-
ing several distinct resistance classes are mixed together arbitrarily
and even collected from several sources. The method allows to par-
tition the data, identifying clusters of homogenous populations to
which they belong, so that statistical parameters of each cluster
can be assessed.
The proposed methodology is extremely general and it allows to
elaborate secondary experimental data regarding mechanical
properties of each building material characterized by a certain
number of distinct resistance classes, such as concrete, steel, tim-
ber, masonry; in particular when reference values of standard devi-
ation and of COV are concerned in the analysis.
Since the identification of homogeneous populations is per-
formed blindly, the procedure is objective and does not require
subjective information, like pre-classification of data, that could
be vague or wrong and, anyhow, frequently misleading.
The method is very helpful in reliability assessments of existing
buildings, where the estimate of COV of relevant material proper-
ties is very difficult, because direct information concerning the
examined structure or primary experimental data derived from
acceptance tests or from in-situ investigations are missing or so
limited to hinder dependable statistical elaborations.
To demonstrate its practical application, the procedure has been
applied to a relevant case study, concerning evaluation of statisti-
cal parameters of cubic strength of the Italian concrete production
during the 1960s.
The procedure effectively enabled to recognize homogeneous
concrete classes and to estimate their statistical properties. Results
reveal that in the 1960s, the Italian concrete production was char-
acterized by a standard deviation of the cubic strength of about
Fig. 19. Flowchart of the proposed procedure.
4.0–4.5 MPa, practically independent of the strength; the COV thus
decreases with the strength and varies in the range 0.25–0.06, that
shrinks to 0.17–0.06 for concretes characterized by a mean value of The method also allowed to evaluate the model uncertainties
the cubic strength bigger than 20.0 MPa. Those results are in good for the mean value of resistance, which is characterized by a COV
agreement with the most pertinent conclusions of the relevant lit- of about 0.056, and for the standard deviation and the COV, which
erature on the topic. are both characterized by a COV of about 0.112.
358 P. Croce et al. / Construction and Building Materials 163 (2018) 343–359

Since the results obtained with different approaches agree sat- [15] D. Diamantidis, M. Holicky, Innovative Methods for the Assessment of Existing
Structures, Czech Technical University in Prague, Klokner Institute, 2012.
isfactorily, it can be concluded that the proposed procedure is not
[16] D.K. Kim, J.J. Lee, J.H. Lee, Seong Kyu Chang, Application of probabilistic neural
only suitable for the intended applications as long as experimental networks for prediction of concrete strength, ICCES 2 (1) (2007) 29–34.
data are available, but it is also ‘‘robust” enough. [17] M. Nikoo, P. Zarfam, H. Sayahpour, Determination of compressive strength of
Finally, as the proposed method was able to identify homoge- concrete using Self Organization Feature Map (SOFM), Eng. Comput. 31 (1)
(2015) 113–121.
neous populations and statistical parameters independently on [18] D.C. Teychenni, Recommendations for the treatment of the variations of
subjective information regarding assignment of data to a particular concrete strength in codes of practice, Mat. Constr. 6 (1973).
population or class, it can be applied to analyze any arbitrary mix- [19] H.C. Erntroy, The variation of works concrete test cubes, IABSE congress report,
ture of several populations, even if the percentage of each of them [20] G.M. Verderame, G. Manfredi, Mechanical properties of concrete used in
is unknown, provided that their pdfs can be approximated by nor- reinforced concrete structures in 1960s. Proc. 10th Italian congress on seismic
mal distributions. engineering, 2001. (in Italian).
[21] A. Masi, A. Digrisolo, G. Santarsiero, Concrete strength variability in Italian RC
Further improvement could address the implementation of the buildings: analysis of a large database of core tests, Appl. Mech. Mater. 597
proposed method in a Bayesian Network, which is a promising (2014) 283–290.
approach for solving the problem of handling vague and uncertain [22] F. Al-Neshawy, J. Piironen, E. Sistonen, M. Ferreira, Development of database
for the in-service inspection of the concrete structures of the Finnish nuclear
knowledge [62]. power plants, IABSE Symposium Report, 2013.
It is interesting to stress that fitting a GMM leads automatically, [23] N. Augenti, F. Parisi, E. Acconcia, MADA: online experimental database for
by its own nature, to the definition of a Bayesian Network (BN). In mechanical modelling of existing masonry assemblages. Proc., 15th World
Conference on Earthquake Engineering, Lisbon, Portugal (CD-ROM), 2012.
the present case the network is composed of two nodes: one node
[24] G.M. Verderame, A. Stella, E. Cosenza, Mechanical properties of the Rebar Steel
is represented by the concrete class, while the other node is repre- implemented in the Reinforced Concrete Structures built during the 1960s,
sented by the concrete compressive strength. The first node has m Proc. 10th Italian congress on seismic engineering, 2001. (in Italian).
possible states, corresponding to concrete classes, at which dis- [25] M.H. Hubler, R. Wendner, Z.P. Bažan, Comprehensive database for concrete
creep and shrinkage: analysis and recommendations for testing and recording,
crete probabilities of occurrence, the so-called probability compo- ACI Mater. J. 112 (4) (2015) 547–558.
nents, are associated; the second node is indeed characterized by [26] G. MacLachlan, D. Peel, Finite Mixture Models, John Wiley and Sons, New York
continuous distributions. (NY), 2000.
[27] W. Seidel, K. Mosler, M. Alker, A cautionary note on likelihood ratio tests in
BNs have already been successfully applied in civil engineering Gaussian Mixture Model, Ann. Inst. Stat. Math. 52 (3) (2000) 481–487.
for the definition of material mechanical characteristics [63,64]. [28] W. Seidel, K. Mosler, M. Alker, Likelihood ratio tests based on Subglobal
However, none of this research has explored the applicability of Optimization: a power comparison in exponential mixture models, Stat. Pap.
41 (2000) 85–98.
this technique for assessing material strength in existing struc- [29] L. Breiman, Statistical modeling: the two cultures, Stat. Sci. 16 (2001) 199–215.
tures. In order to make the BN able to serve our purpose, further [30] K. Aho, D. Derryberry, T. Peterson, Model selection for ecologists: the
steps are required, for example to improve the network with addi- worldviews of AIC and BIC, Ecology 95 (3) (2014) 631–636.
[31] R.J. Steele, A.E. Raftery, Performance of Bayesian Model Selection Criteria for
tional variables representing the results of semi-destructive or Gaussian Mixture Models, University of Washington, Department of Statistics,
non-destructive investigations, in order to draw suitable inference 2009.
on the concrete compressive strength. It is the intention of the [32] K.P. Burnham, D.R. Anderson, Model Selection and Multimodel Inference,
Springer, New York (NY), 2002.
authors to carry on further studies in this direction.
[33] A.E. Raftery, Bayesian model selection in social research, Sociol. Methodol. 25
(1995) 111–163.
References [34] K.P. Burnham, D.R. Anderson, Multimodel inference, understanding AIC and
BIC in model selection, Sociol. Methods Res. 33 (2004) 261–304.
[1] R. Caspeele, M. Sykora, L. Taerwe, Influence of quality control of concrete on [35] A.M. Neville, Properties of Concrete, Pearson Education Limited, Essex, 2004.
structural reliability: assessment using a Bayesian approach, Mater. Struct. 47 [36] L. Santarella, Reinforced Concrete, Design and Behavior, Hoepli, Milano, 1969
(1–2) (2014) 105–116. (in Italian).
[2] M.J. McGinnis, S. Pessiki, Experimental study of the core-drilling method for [37] EN197-1, Cement. Composition, specifications and conformity criteria for
evaluating in situ stresses in concrete structures, J. Mater. Civ. Eng. 28 (2) common cements, 2011.
(2016). [38] DIN 1045, Bestimmung des Deutschen Ausschusses für Stahlbeton; Teil A-
[3] X. Ruan, Y. Zhang, In-situ stress identification of bridge concrete components Bestimmungen für Ausführung von Bauwerken aus Stahlbeton, 1969.
using core-drilling method, J. Struct. Infrastruct. Eng. 11 (2) (2015). [39] ISO 2394. General principles on reliability for structures, 1998.
[4] J. Hoła, J. Bień, Ł. Sadowski, K. Schabowicz, Non-destructive and semi- [40] S. Mirza, M. Hatzinikolas, J. MacGregor, Statistical descriptions of strength of
destructive diagnostics of concrete structures in assessment of their concrete, J. Struct. Div. 105 (1979) 1021–1037.
durability, Bull. Polish Acad. Sci. Tech. Sci. 63 (2015) 87–96. [41] B. Ellingwood, Development of a Probability Based Load Criterion for American
[5] D. Breysse, Nondestructive evaluation of concrete strength: an historical National Standard A58: Building Code Requirements for Minimum Design
review and a new perspective by combining NDT methods, Constr. Build. Loads in Buildings and Other Structures, 1980.
Mater. 33 (2012) 139–163. [42] D.P. McNicholl, B. Wong, Investigation appraisal and repair of large reinforced
[6] L. Rojas-Henao, J. Fernández-Gómez, J.C. López-Agüí, Rebound hammer, pulse concrete buildings in Hong Kong, Deterioration and Repair of Reinforced
velocity, and core tests in self-consolidating concrete, ACI Mater. J. 109 (2012) concrete in Arabian Gulf, 1987.
235–243. [43] J.E. Cook, 10,000 psi concrete, Concr. Int. 11 (10) (1989) 67–75.
[7] Q. Huang, P. Gardoni, S. Hurlebaus, Predicting concrete compressive strength [44] ACI 363.2R-11, Guide to Quality Control and Assurance of High-Strength
using ultrasonic pulse velocity and Rebound number, ACI Mater. J. 108 (2012) Concrete, 2011.
403–412. [45] M. Holicky, Introduction to Probability and Statistics for Engineers, 2014.
[8] S. Hannachi, M.N. Guetteche, Application of the combined method for [46] Swiss Standard SIA 162/1, Betonbauten – Materialpruefung, Schweizerischer
evaluating the compressive strength of concrete on site, Open J. Civ. Eng. 2 Ingenieur- und Architektenverein, 1989.
(2012) 16–21. [47] L.J. Mudrock, The control of concrete quality, Proc. Inst. Civ. Eng. (1953).
[9] R. Pucinotti, Reinforced concrete structure: non destructive in situ strength [48] ACI 214-11, Guide to Evaluation of Strength Test Results of Concrete, 2011.
assessment of concrete, Constr. Build Mater (2015). [49] H. Ruesch, Die Streuung der Eigenschaften von Schwebebeton, IABSE Reports
[10] M.K. Lim, H. Cao, Combining multiple NDT methods to improve testing of the Working Commissions, 1969.
effectiveness, Constr. Build. Mater. 38 (2013) 1310–1315. [50] H.H. Newlon, Variability of portland cement concrete, National Conf. on
[11] A. Jain, A. Kathuria, A. Kumar, Y. Verma, K. Murari, Combined use of non- Statistical Quality Control Methodology in Highway and Airfield Construction
destructive tests for assessment of strength of concrete in structure, Proc. Eng. (1966), 259–284.
54 (2013) 241–251. [51] I. Soroka, On compressive strength variation in concrete, Mat. Constr. 4 (3)
[12] M. Vona, D. Nigro, Evaluation of the predictive ability of the in situ concrete (1971) 155–161.
strength through core drilling and its effects on the capacity of the RC columns, [52] S. Walker, D. Bloem, Variations in portland cement, J. Eng. Mech. 111–121
Mater. Struct. 48 (4) (2015) 1043–1059. (1974) 1–16.
[13] A.C. Mpalaskas, I. Vasilakos, T.E. Matikas, H.K. Chai, D.G. Aggelis, Monitoring of [53] EN1992-1-1. Eurocode 2: Design of concrete structure. Part 1-1: General rules
the fracture mechanisms induced by pull-out and compression in concrete, and rules for buildings. CEN, Brussels, Belgium, 2004.
Eng. Fract. Mech. 128 (2014) 219–230. [54] fib Model Code for Concrete Structures 2010, Ernst & Sohn, 2013.
[14] R. Melchers, Structural reliability analysis and prediction, Chichester, 1999. [55] NTC2008. Italian Code for Buildings Construction. Rome, 2008. (in Italian).
P. Croce et al. / Construction and Building Materials 163 (2018) 343–359 359

[56] C. Pesenti, The development of the cement production in Italy in the last fifty [61] M.R. Berthold, C. Borgelt, F. Hoeppner, F. Klawonn, Guide to Intelligent Data
years. Recalling my life of work at Italcementi, L’Industria Italiana del Cemento Analysis, Springer-Verlag, London, 2010.
9 (1980) 543–568 (in Italian). [62] R. Kruse, C. Borgelt, F. Klawonn, C. Moewes, M. Steinbrecher, P. Held,
[57] L. Cussino, R. Marotta, G. Tognon, The evolution in the field of concrete, Computational Intelligence: A Methodological Introduction, Springer-Verlag,
L’Industria Italiana del Cemento 9 (1980) (in Italian). London, 2013.
[58] Royal Decree 16/11/1939 nr. 2229, Code for the execution of concrete and [63] M. Deublein, M. Schlosser, M.H. Faber, Hierarchical modeling of structural
reinforced concrete structures, Rome, 1939. (in Italian). timber material properties by means of bayesian probabilistic networks, Proc.
[59] Ministry of Public Works, D.M.L.L. 30/5/1972, Code for the execution of of ICASP 11th International Conference on Applications of Statistics and
reinforced concrete, prestressed concrete and steel structures, Rome, 1972. (in Probability in Civil Engineering, ICASP, Zurich, Switzerland, 2011.
Italian). [64] H.S. Sousa, J.S. Machado, J.M. Branco, P.B. Lourenço, On site assessment of
[60] F. Marsili, P. Croce, F. Klawonn, F. Landi, A Bayesian network for the definition of structural timber by means of hierarchical models and probabilistic methods,
probability models for compressive strength of concrete homogeneous Constr. Build. Mater. 101 (2) (2015) 1188–1196.
population. In Proceedings of the 14th International Probabilistic Workshop,
Ghent, December 2016, Springer International Publisher AG, 2016, pp. 269–284.

You might also like