Meta AnalysisofPrevalenceStudiesusingR August2021

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/353878768
META-ANALYSIS OF PREVALENCE STUDIES USING R Sujata Suvarnapathaki*
Article · August 2021
CITATIONS READS
0 69
1 author:
Sujata Suvarnapathki
Ramnarain Ruia College
7 PUBLICATIONS 33 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Bioequivalence View project
All content following this page was uploaded by Sujata Suvarnapathki on 13 August 2021.
The user has requested enhancement of the downloaded file.

ORIGINAL RESEARCH PAPER Volume - 10 | Issue - 08 | August - 2021 | PRINT ISSN No. 2277 - 8179 | DOI : 10.36106/ijsr
INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH
META-ANALYSIS OF PREVALENCE STUDIES USING R
Statistics
Sujata Ph.D., Assistant Professor, Department of Statistics, Ramnarain Ruia Autonomous
Suvarnapathaki* College, Matunga, Mumbai, India. *Corresponding Author
ABSTRACT
In the recent times, the use of Meta-analysis in the eld of prevalence studies in epidemiology and studies in clinical research has increased to a
great extent. The results of multiple scientic studies dealing with the same research question are combined using Meta-analysis. The individual
studies that are found to report the measurements, which would differ on numerical scale, are included in Meta-analysis. It involves use of statistics
to derive a pooled estimate of the parameter under consideration, by minimizing the error in the estimates. The objectives of Meta-analysis are to
condense and integrate results from a number of individual studies, to evaluate the differences in the results among studies, to increase precision in
estimating effects and to decide the requirement of further investigation through new studies. Meta-analysis is used in areas of research where the
sizes of the study are very small (Experiments where humans are subjects) and especially when no large scale, high quality trials are available. The
unbiased analysis of available evidences is carried out in Meta-analysis. This research paper demonstrates how to perform Meta-analysis using the
freely available statistical software R, using a working (dummy data) example from Prevalence studies. It gives a brief pointer into the Meta-
analysis of prevalence studies and offers directions to more advanced Meta-analysis methods available in R. R is a potent and exible tool to
conduct Meta-analyses. This paper also briey mentions the potential use of Meta-analysis, in the current pandemic Covid-19. Globally clinical
trials are being conducted for establishing the efcacy of the various drug products to help recover patients suffering from Covid-19. Meta-analysis
can be used to arrive at the pooled estimate of 'Recovery rate' of patients, suffering from Covid-19, across the globe.
KEYWORDS
Meta-analysis, Prevalence studies, Fixed Effect Model, Random Effect Model, R
INTRODUCTION very small (Expt. where humans are subjects) and especially when no
The objectives of Meta-analysis are to condense and integrate results large scale, high quality trials are available.
from a number of individual studies, to evaluate the differences in the
results among studies, to increase precision in estimating effects and to Meta-analysis uses approaches from statistics to derive a pooled
decide the requirement of further investigation through new studies. In estimate of the attribute under consideration e.g. Recovery rate, Death
the recent times, the use of Meta-analysis in the eld of prevalence rate etc. Generally, a study collects data from individual subjects i.e.
studies in epidemiology and studies in clinical research has increased the number of data points is same as the number of subjects. A meta-
to a great extent. It involves use of statistics to derive a pooled estimate analysis collects data from individual studies (such as 50 studies = 50
of the parameter under consideration, by minimizing the error in the data points).
estimates.
Four main objectives of meta-analysis are viz. a) to summarize and
This research paper demonstrates how to perform Meta-analysis using integrate results from a number of individual studies b) to analyze
the freely available statistical software R, using a working (dummy differences in the results among studies c) to increase precision in
data) example from Prevalence studies. estimating effects and d) to determine if new studies are needed to
further investigate an issue. Some logical phases/steps to be followed
To understand the causes of the diseases, usually, a preferred approach to perform meta-analysis are i) Identication phase, where a literature
in Epidemiology would be to conduct Incidence studies. But they are search is done to identify studies that have addressed the same research
likely to consume large resources and involve lengthy periods of question ii) Selection phase, in which the criteria to select the studies to
follow-up. It may be difcult to measure incidence in some cases, for be included is applied. The criteria depend on the research question and
some diseases like asthma and diabetes, without a rigorous follow-up. any specic methodological issues. It's imperative to formulate a
Thus, in such cases more practical option is to study the 'prevalence' of precise set of criteria that is applied throughout. iii) Calculation of the
disease at a particular point in time. effect sizes where effect sizes are expressed in a same way for each
study. Meta-analysis is carried out by getting the combined effect and
Prevalence is also sometimes referred to as prevalence rate. It is the other overall results using appropriate method. The homogeneity of
proportion of persons in a population who have a specic characteristic effect sizes is checked. Condence Intervals for effect sizes are
or an attribute at a specied point in time or over a specied period of calculated. In meta-analysis effect size estimates are not influenced by
time. Prevalence includes new as well as the preexisting cases, in the the sample size. Traditional parametric methods like t and F tests
population at the specied time. values are partially functions of the sample size. So, for different
sample sizes, the values of statistic differ widely even when the
Prevalence studies may be a very important research process. The (treatment – control) differences are equivalent.
prevalence of a characteristic e.g. occurrence of disease, recovery from
a certain disease, may differ from one group to another group with METHODOLOGY
respect to some demographic factor like age-group, duration of the This research paper demonstrates the use of R for meta-analysis of
disease etc. It is very difcult to assess whether an exposure increases Prevalence studies. A dummy data of prevalence studies, on a certain
the disease incidence. disease, is used to demonstrate the use of R in meta-analysis.
A meta-analysis combines the results of several scientic studies Statistical Concepts

addressing the same question. Though, individual study measurements A few important aspects in meta-analysis are, a) Pooling criteria:
that are reported may have some degree of error. It is simply analysis of Inclusion-Exclusion criteria: Study should meet the criteria for
analyses that correspond to a shared topic. There are two different inclusion. e.g. Selecting randomized controlled trials. b) Study
approaches in Meta-analysis viz. meta-analysis using aggregate data Design: Those studies which have similar study designs are pooled.
(AD) or individual participant data (IPD). Using an AD approach, e.g. Studies indicating prevalence rate of certain diseases like
summary data for the same outcome e.g. prevalence rate of a disease, Diabetes, Cancer and even for current pandemic Covid-19.
proportions from each study, are pooled for the statistical analysis. The c)Treatment-by-Study interaction: The treatment by Study interaction
use of the AD approach in meta-analysis requires selecting a model for is expected to be insignicant to ensure a meaningful pooling. When
combining ndings from different studies. the studies are not properly pooled, the estimates obtained will not be
reliable. There could be two issues viz. a) Publication Bias: this could
Meta-analysis used in areas of research where the sizes of the study are be due to exclusion of unpublished studies. Funnel plots are used in
International Journal of Scientific Research 1
Volume - 10 | Issue - 08 | August - 2021 PRINT ISSN No. 2277 - 8179 | DOI : 10.36106/ijsr
general to detect publication bias and b) Varying individual study Table 2: Random Effect Model
quality: as a result of including poor designed studies or the studies Random Effect Model
may not be of same quality (e.g. precision, number of subjects).
Combined effect (Weighted mean) k
*
The power of statistical analyses can be increased with meta-analysis
that includes pooling the results of all available studies/trials. It is of *
åw T
i =1
i i
great importance to pool the results properly in order to get a T. = k
meaningful combined estimate that describes the set of studies well. *
Randomization may inuence the individual estimates of treatment
effect though.
åw
i =1
i
Variance of the combined effect

The interest lies in nding whether there is an excessive variation than
by chance alone. This is called as heterogeneity. There are two
approaches to study this, i) Fixed-Effect Model ii) Random-Effect
Model.
i) Fixed-Effect Model:
It is assumed that all the included studies share a common effect size, μ.
the observed effects will be distributed about μ, with a variance σ2 that
depends primarily on the sample size for each study.
95% Condence Interval for *
Lower Limit = T. - 1.96 * SE (T. )
*
In general for any observed effect Ti in ith study, Ti = μ + εi, where εi is Combined effect *
the within study error for Study i and μ is the common effect size. This Upper Limit = T *. + 1.96 * SE (T. )
model is preferred when all factors which could inuence the effect SE (T *. ) = V *.
size are the same in all the study populations. The goal is to assign more
weight to the studies that carry more information and deal with only USING R FOR META-ANALYSIS
one source of sampling error – within studies. The weight wi = 1/vi For various steps in Meta-analysis, like data preparation, pooling the
where vi is the within-study variance for ithstudy. effect sizes, detection of outliers, if any, and the graphical
representation or visualization R codes are discussed in this section
ii) Random-Effect Model: along with the interpretation of their output, e.g. the decision regarding
Rather than assuming that there is one true effect, it is assumed that xed-effect model or random-effect model and the inference,
there is a distribution of true effect sizes. Hence, the combined effect thereafter. A dummy data on prevalence studies are used to explain
does not represent the one common effect. It represents the mean of the steps involved in meta-analysis. The standard meta-analysis is
population of true effects. conducted using the R package “meta”.
In general, for any observed effect Ti in ith study, Ti = Θi + εi = μ + ξi + εi, Data Preparation In Excel As Per R Syntax:
where Θi is the true effect, ξi is the between study error, εi is the within The data details for all the studies are arranged in a single sheet
study error for study i and μ is the the mean of all true effects. depending on its type and summary measure required. For different
Data types there are different R functions. For i) Continuous outcome
This model is better in case one wants to estimate the effects in a range data, the R Function is “metacont” ii) Binary outcome data, the R
of populations, and do not want the overall estimate to be overly function is “metabin” iii) Single correlations, the R function is
inuenced by any one population. The goal is to decompose the “metacor” iv) Single means, R function is “metamean” v) Incidence
observed variance into its two component parts, within-studies and rate R function that is used is “metainc”. vi) Pre-calculated effect size
between-studies, and then use both parts when assigning the weights, data, R function is “metagen” vii) Single incidence rates, R function is
w*I = 1/v*i where v*i is the sum of the within-study variance for ith study vi “metarate”. viii) Single proportion (Prevalence/Prevalence rate) it is
and the between-studies variance, tau2. “metaprop”.
In simple words, Fixed Effect Model considers only 'within study Pooling The Effect Sizes:
variance' whereas Random Effect Model considers 'sum of the within R Code to obtain the pooled effect size (Prevalence rate) using Fixed
study variance' and the 'between study variance' to estimate the weight Effect Model is as follows.
of the ith study.
# To install package "meta" in R
Table 1: Fixed Effect Model
install.packages("meta")
Fixed Effect Model library(meta)
Combined effect (Weighted k # To import the data le as csv le
mean)
åwiTi
i=1
meta<-read.csv(choose.les(),header=TRUE)
Note: There could be multiple options to import the data le.
T. = k
# To obtain the pooled effect sixe using Fixed Effect Model
åw
i=1
i
MetaFixed<-
metaprop(n,Total,studlab=paste(Study_Place),data=meta,method=
Variance of the combined "Inverse",sm="PRAW",comb.xed = TRUE,comb.random =
effect
1 FALSE,prediction = TRUE)
V. = k # To view the output
åw
i=1
i
MetaFixed
Table 3: R output for Fixed Effect model

Sr. No Study Place Proportion 95% CI %w
95% Condence Interval for Lower Limit = T. -1.96*SE(T. ) (Fixed)
Combined effect
Upper Limit = T. +1.96*SE(T. ) 1 USA 0.0185 [0.0113; 10.7
SE(T. ) = V. 0.0284]
2 International Journal of Scientific Research

2 Korea 0.0217 [0.0104; 3.9 for heterogeneity is statistically signicant (p-value = 0.0095).
0.0395]
3 China 0.0484 [0.0252; 1.0 Since the test for heterogeneity is statistically signicant, Random
0.0830] Effect Model has to be used for pooling the estimate of Prevalence.
4 Spain 0.0247 [0.0107; 2.4
0.0481] R code for Random Effect Model,
5 China 0.0511 [0.0208; 0.5 #To obtain the pooled effect size using Random Effect Model
0.1024]
MetaRandom<-
6 East Asia 0.0279 [0.0153; 3.3
metaprop(n,Total,studlab=paste(Study_Place),data=meta,method=
0.0463]
"Inverse",sm="PRAW",comb.xed = FALSE,comb.random =
7 Italy 0.0421 [0.0288; 3.3 TRUE,prediction = TRUE)
0.0592]
8 China 0.0191 [0.0132; 16.5 # To view the output
0.0267]
9 East Asia 0.0189 [0.0052; 2.0 MetaRandom
0.0476]
Table 5: R Output For Random Effect Model
10 China 0.0249 [0.0120; 3.0
0.0453] Sr. No Study Place Proportion 95% CI %w
(Random)
11 Korea 0.0413 [0.0190; 1.0
0.0769] 1 USA 0.0185 [0.0113; 0.0284] 8.2
12 China 0.0331 [0.0250; 9.3 2 Korea 0.0217 [0.0104; 0.0395] 5.2
0.0429] 3 China 0.0484 [0.0252; 0.0830] 1.9
13 Korea 0.0407 [0.0165; 0.8 4 Spain 0.0247 [0.0107; 0.0481] 3.9
0.0821] 5 China 0.0511 [0.0208; 0.1024] 1.1
14 Asia 0.0130 [0.0042; 5.4 6 East Asia 0.0279 [0.0153; 0.0463] 4.7
0.0300] 7 Italy 0.0421 [0.0288; 0.0592] 4.7
15 Italy 0.0153 [0.0062; 5.4 8 China 0.0191 [0.0132; 0.0267] 9.3
0.0313] 9 East Asia 0.0189 [0.0052; 0.0476] 3.4
16 Germany 0.0166 [0.0072; 5.3 10 China 0.0249 [0.0120; 0.0453] 4.4
0.0324] 11 Korea 0.0413 [0.0190; 0.0769] 1.9
17 USA 0.0201 [0.0125; 9.5 12 China 0.0331 [0.0250; 0.0429] 7.8
0.0306] 13 Korea 0.0407 [0.0165; 0.0821] 1.6
18 Spain 0.0435 [0.0210; 1.0 14 Asia 0.0130 [0.0042; 0.0300] 6.2
0.0785] 15 Italy 0.0153 [0.0062; 0.0313] 6.2
19 Japan 0.0267 [0.0139; 3.1 16 Germany 0.0166 [0.0072; 0.0324] 6.2
0.0461] 17 USA 0.0201 [0.0125; 0.0306] 7.9
20 USA 0.0160 [0.0095; 12.8 18 Spain 0.0435 [0.0210; 0.0785] 1.9
0.0252] 19 Japan 0.0267 [0.0139; 0.0461] 4.6
20 USA 0.0160 [0.0095; 0.0252] 8.7
Table 4: Additional Output (R output for Fixed Effect model) Table 6: Additional Output (R output for Random Effect model)
Number of studies combined: k = 20 Number of studies combined: k = 20
Proportion 95%-CI
Fixed effect model 0.0221 [0.0195; 0.0247] Proportion 95%-CI
Prediction interval [0.0108; 0.0368]
Random effects model 0.0238 [0.0198; 0.0278]
Combined Effect under Fixed Effect Model=0.0221.
Prediction interval [0.0108; 0.0368]
Quantifying heterogeneity:
tau^2 < 0.0001 [0.0000; 0.0002];
tau = 0.0058 [0.0000; 0.0136];
I^2 = 47.8% [12.1%; 68.9%]; H = 1.38 [1.07; 1.79] Quantifying heterogeneity:
Test of heterogeneity: tau^2 < 0.0001 [0.0000; 0.0002];
Q d.f. p-value
36.37 19 0.0095 tau = 0.0058 [0.0000; 0.0136];
Details on meta-analytical method: I^2 = 47.8% [12.1%; 68.9%]; H = 1.38 [1.07; 1.79]
- Inverse variance method
- DerSimonian-Laird estimator for tau^2
- Jackson method for condence interval of tau^2 and tau Test of heterogeneity:
- Untransformed proportions
- Clopper-Pearson condence interval for individual studies Q d.f. p-value
The classic meta-analysis model based on the 'inverse variance' 36.37 19 0.0095
method is used in this research paper. In R Code, this method is
specied using 'method=inverse'. This method uses untransformed
proportions to perform meta-analysis. 'Clopper-Pearson' condence Details on meta-analytical method:
interval also known as Exact Binomial Interval is used to calculate the
condence interval for individual study results. The estimate of - Inverse variance method
'between study variability' is obtained using 'DerSimonian-Laird
- DerSimonian-Laird estimator for tau^2
method' for meta-analysis. The Condence Interval for the estimate of
'between study variability', tau and tau2, is obtained using 'Jackson - Jackson method for condence interval of tau^2 and tau
method'.
- Untransformed proportions
Interpretation of the output for Fixed Effect Model is 1) Combined - Clopper-Pearson condence interval for individual studies
Effect under Fixed Effect Model=0.0221. So, the pooled estimate of
the prevalence rate of the disease, for the above data, is 0.0221. 2)
47.8% of the observed variation is attributed to heterogeneity. The test Interpretation of the output for Random Effect Model is as below.
Combined Effect under Random Effect Model=0.0238. So, the pooled the unpublished studies with the help of communication between
estimate of the prevalence rate of the disease, for the above data, is researchers. Search Bias: Even when there is no publication bias, a
0.0238. 95% Condence Interval for the pooled estimate of prevalence faulty search can miss some of them. In searching databases, much
is (0.0198; 0.0278). care should be taken to ensure that the set of key words used for
searching is as complete as possible.
Detection of Outliers:
The outliers in the data are identied. One or more studies with Funnel Plot:
extreme effect sizes, which do not quite t in with other study effect The possible bias in the identication phase and selection phase can be
sizes, may distort our pooled effect estimate. detected using a graphical method, Funnel Plot. The effect size in each
study is plotted on the horizontal axis against standard error or sample
Studies for which 1) The upper bound of the 95% condence interval size on the vertical axis. A symmetrical Funnel shape centered in the
for ith study is lower than the lower bound of the pooled effect average effect of the studies will be observed in the graph when there
condence interval (i.e., extremely small effects). are no biases. Lack of symmetry is observed in case the negative
studies are missing.
2) The lower bound of the 95% condence interval for ith study is
higher than the upper bound of the pooled effect condence interval Drawbacks of the Funnel Plot are viz. 1) Lack of symmetry in a funnel
(i.e., extremely large effects) are identies as the outliers. plot can also be caused by heterogeneity in the studies. 2) They are
difcult to interpret when the number of studies is small.
R Code for detecting outliers in the data under consideration:
R Code For Funnel Plot:
# To detect outliers
if (!require("devtools")) { install.packages("devtools")} # To draw a Funnel plot:
devtools::install_github("MathiasHarrer/dmetar") funnel(MetaRandom,studlab = TRUE)
library(dmetar)
out<-nd.outliers(MetaRandom)
out
It is important to note that “nd.outliers” function automatically reruns
initial analysis, excluding the identied outliers.
“No Outliers” were detected in the dummy data used.
Graphical Representation/Visualization:
Forest Plot and Funnel Plot
Forest Plot:
A forest plot graphically displays the estimated results, from multiple
studies in meta-analysis, along with the overall results. Forest plot
demonstrates the degree to which data from multiple studies observing Figure 2: Funnel Plot
the same effect overlap with one another. Results that fail to overlap
well are termed heterogeneous and attributed to the heterogeneity of CONCLUSION
the data. Condence intervals for effect sizes of every study are plotted The researchers may be interested in reviewing multiple prevalence
against corresponding effect sizes in a forest plot. studies with the same research objective. In such a scenario, meta-
analysis can be a very robust option in order to arrive at a single pooled
R Code for Forest Plot: estimate of the prevalence. Meta-analysis is a disciplined way of
# To draw a Forest Plot summing up research ndings. As compared to conventional review
forest(MetaRandom,xlim=c(0,0.1),comb.xed=TRUE,comb.rando methods, meta-analysis is a more distinguished and rened approach
m=TRUE,digits=2,squaresize=1.2,col.diamond.xed = to represent the outcomes from multiple Prevalence studies. There
"BLUE",col.diamond.random="GREEN") could be multiple similar studies masked in different approaches.
Meta-analysis has the capability of nding the relationships across
such studies. It protects against over-interpreting differences across
studies. Meta-analysis can handle a large numbers of studies as
compared to traditional review methods. There are certain drawbacks
o f m e t a - a n a l y s i s t o o . A l o t o f e ff o r t s a r e i n v e s t e d i n
literature/publication search. Meta-analysis is criticized for 'adding
apples to oranges' (studies that are different in nature).
R represents a potent and exible tool to conduct meta-analyses. This

research work gives a brief pointer into the topic.
DISCUSSION
Meta-analysis of prevalence studies may give a new insight to Covid-
19, Pandemic scenario where the different studies could measure the
population burden of the disease. There is a huge burden of morbidity
Figure 1: Forest Plot over the health services across the globe, in the current pandemic
scenario of Covid-19. In many countries the death rate is relatively
In the above Forest Plot following things are observed. very high.
1. The left-hand column lists the names of the studies.
2. The area of each square is proportional to the study's weight in the Globally Phase II/III clinical trials are being conducted to test the
meta-analysis. efcacy of various drug products like Remdesivir, Favipiravir for the
3. The horizontal lines represent the Condence Intervals for each treatment of Covid-19. The 'Rate of Recovery' may be one of the
study effect. important end-points in these trials.
4. The midpoint is the point effect estimate for each study.
A pooled estimate of the Recovery rate, across the global trials, can be
5. Overall effect estimate.
obtained using Meta-analysis. This will give an important insight into
the future studies of the similar drug products, to be used as a treatment,
Another important aspect of Meta-analysis, as mentioned earlier, is to for Covid-19.
detect biases, if any, while doing a literature search in the Identication
Phase. There could be two types of biases. Publication Bias: 'Positive' This may also be applied for studying the efcacy rate of various
studies (usually in favor of a new treatment or against a well- vaccines and the rate of covid-19 infections after the administration of
established one) are more likely to be printed. It is necessary to identify different vaccines, across different countries.
4 International Journal of Scientific Research
REFERENCES
[1] Pearce N. Classication of epidemiological study designs. International Journal of
Epidemiology, Volume 41, Issue 2, April 2012, Pages 393–397
[2] Sara Balduzzi, Gerta Rücker, Guido Schwarzer. How to perform a meta-analysis with R:
a practical tutorial. BMJ Journals, Volume 22, Issue 4
[3] Stephen B. Thacker, Donna F. Stroup. Meta-analysis, Statistics. Encyclopaedia
Britannica
[4] George A Kelley, Kristi S Kelley. Statistical models for meta-analysis: A brief tutorial,
World Journal of Methodology, 2012 Aug 26 2(4): 27–32.
[5] Haidich A B. Meta-analysis in medical research. Hippokratia. 2010 Dec; 14(Suppl 1):
29–37.
[6] Guido Schwarzer, James R. Carpenter, Gerta Rucker. Meta-Analysis with R. Springer.
08-Oct-2015
View publication stats

Meta AnalysisofPrevalenceStudiesusingR August2021

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Meta AnalysisofPrevalenceStudiesusingR August2021

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

META-ANALYSIS OF PREVALENCE STUDIES USING R Sujata Suvarnapathaki*

Article · August 2021

Bioequivalence View project

The user has requested enhancement of the downloaded file.

INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH

META-ANALYSIS OF PREVALENCE STUDIES USING R

A meta-analysis combines the results of several scientic studies Statistical Concepts

Variance of the combined effect

V. = k # To view the output

Table 3: R output for Fixed Effect model

2 International Journal of Scientiﬁc Research

“No Outliers” were detected in the dummy data used.

R represents a potent and exible tool to conduct meta-analyses. This

International Journal of Scientiﬁc Research 5

View publication stats

You might also like