General Notes / Setup

The Handbook of Research Synthesis and Meta-Analysis by Cooper et al. (2019), now in its third edition, has been one of the quintessential texts on meta-analysis and the entire research synthesis process as a whole. In this document, I provide the R code to reproduce the worked examples and analyses from various chapters. Emphasis will be on using the metafor package, but several other packages will also be used. To read more about the metafor package, see the package website and the package documentation.

Note that the 'devel' version of metafor needs to be installed as some of the datasets used below are not currently in the official release of the package on CRAN. The 'devel' version of the package can be installed with:

install.packages("remotes")
remotes::install_github("wviechtb/metafor")

This step will become obsolete once a new release of the metafor package is published on CRAN.

Once the package is installed, we can load it with:

library(metafor)

A few additional notes:

  1. Results are only reproduced for chapters containing worked examples.
  2. Occasionally, there are some minor discrepancies between the results shown in the book and those obtained below. These can result from using different software packages that implement methods in slightly different ways, due to intermittent rounding or using a different rounding scheme, or due to chance when the analyses involve some stochastic process (e.g., when using MCMC sampling in the chapter on Bayesian methods). Minor discrepancies will (usually) not be commented on. However, where discrepancies are more substantial, they will be noted (and the reasons for them).
  3. The results are generally given without discussion or context. The code below is not a substitute for reading the book, but is meant to be used together with it. In other words, readers of the book interested in replicating the results with R can see here how this is possible.

Finally, let's create a little helper function for formatting some results later on (essentially like round(), but this one does not drop trailing zeros).

fc <- function(x, digits=4, ...)
   formatC(x, format="f", digits=digits, ...)

1) Research Synthesis as a Scientific Process

To recreate Figure 1.1 (showing the number of citations to articles including the terms 'research synthesis', 'systematic review', or 'meta-analysis' in their titles), I redid the search in the Web of Science Core Collection, which yielded the following data:

dat <- read.table(header=TRUE, text = "
year hits
2020 15973
2019 32120
2018 26293
2017 23885
2016 21318
2015 18487
2014 14781
2013 12357
2012 9802
2011 7528
2010 6120
2009 5121
2008 4006
2007 3553
2006 2771
2005 2336
2004 1911
2003 1526
2002 1309
2001 1005
2000 891
1999 832
1998 729
1997 580
1996 466
1995 70
1994 25
1993 11
1992 5
1991 9
1990 20
1989 127
1988 104
")
dat <- dat[-1,]               # remove current year (not complete)
dat <- dat[dat$year >= 1995,] # keep only 1995 or later
dat <- dat[nrow(dat):1,]      # reverse order

We can then create a bar chart based on these data:

# Figure 1.1

par(mar=c(4,4,2,2))
barplot(dat$hits, names.arg=dat$year, las=2, space=0.4, col="#6c9ece", border=NA)
abline(h=seq(0, 30000, by=5000), col="gray")
barplot(dat$hits, space=0.4, col="#6c9ece", border=NA, add=TRUE, axes=FALSE)


10) Evaluating Coding Decisions

Part of the code for this chapter (by Jack Vevea, Nicole Zelinksy, and Robert Orwin) is adapted from the chapter itself (see section 10.5). Below, we will make use of several additional packages that need to be installed (if they are not already installed). So let's do this first.

# install the 'irr' package (if it is not already installed) and load it

if (!require(irr)) {
   install.packages("irr")
   library(irr)
}
# install the 'psych' package (if it is not already installed) and load it

if (!require(psych)) {
   install.packages("psych")
   library(psych)
}
# install the 'vcd' package (if it is not already installed) and load it

if (!require(vcd)) {
   install.packages("vcd")
   library(vcd)
}
# install the 'lme4' package (if it is not already installed) and load it

if (!require(lme4)) {
   install.packages("lme4")
   library(lme4)
}
# Table 10.1

dat <- read.table(header=TRUE, text = "
study c1 c2 c3
1  3 2 3
2  3 1 1
3  2 2 2
4  3 2 3
5  1 1 1
6  3 1 3
7  2 2 1
8  1 1 1
9  2 2 1
10 2 1 3
11 2 2 2
12 3 3 3
13 3 1 2
14 2 1 1
15 1 1 1
16 1 1 2
17 3 3 1
18 2 2 2
19 2 2 2
20 3 1 1
21 2 1 2
22 1 1 3
23 3 2 2
24 3 3 3
25 2 2 3")
# put ratings for coders 1, 2, and 3 into separate vectors

c1 <- dat$c1
c2 <- dat$c2
c3 <- dat$c3

# combine ratings for coders 1 and 2 into a matrix

c1c2 <- cbind(c1, c2)

# combine ratings from all three coders into a matrix

all3 <- cbind(c1, c2, c3)
# cell counts and marginal totals for coders 1 and 2

addmargins(table(c2, c1))
##      c1
## c2     1  2  3 Sum
##   1    5  3  4  12
##   2    0  7  3  10
##   3    0  0  3   3
##   Sum  5 10 10  25
# note: first variable is for rows, second is for columns, so to reproduce
# panel A of Table 10.2, we have to use table(c2, c1)
# agreement rate for coders 1 and 2

mean(c1 == c2)
## [1] 0.6
# agreement rate for all three coders

mean(c1 == c2 & c1 == c3)
## [1] 0.36
# agreement rate (in %) between coders 1 and 2

irr::agree(c1c2)
##  Percentage agreement (Tolerance=0)
## 
##  Subjects = 25 
##    Raters = 2 
##   %-agree = 60
# note: agree(c1c2) would have been sufficient, but due to the large number of
# additional packages being used, I will make it clear by using the :: operator
# which package a function belongs to (unless this is clear from the contexts)
# agreement rate (in %) between all three coders

irr::agree(all3)
##  Percentage agreement (Tolerance=0)
## 
##  Subjects = 25 
##    Raters = 3 
##   %-agree = 36
# unweighted Cohen's kappa for coders 1 and 2

irr::kappa2(c1c2)
##  Cohen's Kappa for 2 Raters (Weights: unweighted)
## 
##  Subjects = 25 
##    Raters = 2 
##     Kappa = 0.425 
## 
##         z = 3.51 
##   p-value = 0.000455
# unweighted Cohen's kappa for all three coders

irr::kappam.fleiss(all3)
##  Fleiss' Kappa for m Raters
## 
##  Subjects = 25 
##    Raters = 3 
##     Kappa = 0.315 
## 
##         z = 3.85 
##   p-value = 0.00012
# weighted Cohen's kappa for coders 1 and 2

irr::kappa2(c1c2, weight=0:2)
##  Cohen's Kappa for 2 Raters (Weights: 0,1,2)
## 
##  Subjects = 25 
##    Raters = 2 
##     Kappa = 0.386 
## 
##         z = 3.22 
##   p-value = 0.00126

We can also use the psych package to compute Cohen's kappa, which also provides corresponding confidence intervals.

# unweighted and weighted Cohen's kappa for coders 1 and 2

W <- outer(1:3, 1:3, FUN = function(x,y) abs(x-y)) # create weight matrix
W
##      [,1] [,2] [,3]
## [1,]    0    1    2
## [2,]    1    0    1
## [3,]    2    1    0
res <- psych::cohen.kappa(c1c2, w=W)
print(res, digits=3)
## Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)
## 
## Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries 
##                  lower estimate upper
## unweighted kappa 0.185    0.425 0.666
## weighted kappa   0.386    0.386 0.386
## 
##  Number of subjects = 25

Note that the CI for weighted kappa is not correct! Using the vcd package, we can also compute Cohen's kappa and obtain the correct CI for weighted kappa.

print(vcd::Kappa(table(c1,c2)), digits=3, CI=TRUE)
##            value   ASE    z Pr(>|z|) lower upper
## Unweighted 0.425 0.123 3.46 0.000532 0.185 0.666
## Weighted   0.386 0.128 3.02 0.002558 0.135 0.637
# note: the (default) weighting scheme used for computing weighted kappa by
# the function is the one described in the chapter
# Krippendorff's alpha for coders 1 and 2 when treating the data as ratings on
# a nominal, on an ordinal, or on a ratio scale

irr::kripp.alpha(t(c1c2))
##  Krippendorff's alpha
## 
##  Subjects = 25 
##    Raters = 2 
##     alpha = 0.403
irr::kripp.alpha(t(c1c2), method="ordinal")
##  Krippendorff's alpha
## 
##  Subjects = 25 
##    Raters = 2 
##     alpha = 0.278
irr::kripp.alpha(t(c1c2), method="ratio")
##  Krippendorff's alpha
## 
##  Subjects = 25 
##    Raters = 2 
##     alpha = 0.311
# correlation between coders 1 and 2

cor(c1, c2)
## [1] 0.4520228
# note: the cor() function is part of the 'stats' package, which comes with R
# mean correlation between all pairs of coders

irr::meancor(all3)
##  Mean of bivariate correlations R
## 
##  Subjects = 25 
##    Raters = 3 
##         R = 0.331 
## 
##         z = 1.55 
##   p-value = 0.121
# intraclass correlation coefficient for coders 1 and 2

psych::ICC(c1c2)
## Call: psych::ICC(x = c1c2)
## 
## Intraclass correlation coefficients 
##                          type  ICC   F df1 df2     p lower bound upper bound
## Single_raters_absolute   ICC1 0.28 1.8  24  25 0.082      -0.052        0.55
## Single_random_raters     ICC2 0.35 2.6  24  24 0.010       0.034        0.61
## Single_fixed_raters      ICC3 0.45 2.6  24  24 0.010       0.142        0.68
## Average_raters_absolute ICC1k 0.43 1.8  24  25 0.082      -0.110        0.71
## Average_random_raters   ICC2k 0.52 2.6  24  24 0.010       0.066        0.76
## Average_fixed_raters    ICC3k 0.62 2.6  24  24 0.010       0.248        0.81
## 
##  Number of subjects = 25     Number of Judges =  2
# note: this function computes 6 different types of ICCs; the first three are
# discussed in the chapter and correspond to the three different designs
# described on page 187

Using the lmer() function from the lme4 package, we can also do these calculations manually.

# restructure data into 'long' format

dat <- data.frame(study = 1:25,
                  rater = rep(1:2, each=25),
                  rating = c(c1,c2))

# absolute agreement based on one-way random-effects model

res <- lmer(rating ~ (1 | study), data = dat)
vcs <- data.frame(VarCorr(res))
vcs$vcov[1] / (vcs$vcov[1] + vcs$vcov[2])
## [1] 0.2777018
# absolute agreement based on two-way random-effects model

res <- lmer(rating ~ (1 | study) + (1 | rater), data = dat)
vcs <- data.frame(VarCorr(res))
vcs$vcov[1] / (vcs$vcov[1] + vcs$vcov[2] + vcs$vcov[3])
## [1] 0.3545258
# absolute agreement based on two-way mixed-effects model

res <- lmer(rating ~ rater + (1 | study), data = dat)
vcs <- data.frame(VarCorr(res))
vcs$vcov[1] / (vcs$vcov[1] + vcs$vcov[2])
## [1] 0.4503106
# example data from page 199

dat <- data.frame(
   study = 1:25,
   rater = rep(1:3, each=25),
   rating = c(3,3,2,3,NA,3,2,1,2,2,NA,3,3,2,1,1,3,2,2,3,2,1,3,NA,2,
              2,1,NA,2,1,1,2,1,2,1,2,3,1,1,NA,1,3,2,2,1,1,1,2,3,2,
              3,1,2,3,1,3,1,1,NA,3,2,3,2,1,1,2,1,2,2,1,2,3,2,3,3))
dat[c(1:4, 71:75),]
##    study rater rating
## 1      1     1      3
## 2      2     1      3
## 3      3     1      2
## 4      4     1      3
## 71    21     3      2
## 72    22     3      3
## 73    23     3      2
## 74    24     3      3
## 75    25     3      3
# absolute agreement for all three raters (based on one-way random-effects model)

res <- lmer(rating ~ (1 | study), data = dat)
vcs <- data.frame(VarCorr(res))
vcs$vcov[1] / (vcs$vcov[1] + vcs$vcov[2])
## [1] 0.1794355

11) Effect Sizes for Meta-Analysis

# data for Figure 11.1

dat <- read.table(header=TRUE, text = "
study    md   n   var    se  pval
A     0.400  60 0.067 0.258 0.121
B     0.200 600 0.007 0.082 0.014
C     0.300 100 0.040 0.201 0.134
D     0.400 200 0.020 0.141 0.005
E     0.300 400 0.010 0.100 0.003
F    -0.200 200 0.020 0.141 0.157")
dat
##   study   md   n   var    se  pval
## 1     A  0.4  60 0.067 0.258 0.121
## 2     B  0.2 600 0.007 0.082 0.014
## 3     C  0.3 100 0.040 0.201 0.134
## 4     D  0.4 200 0.020 0.141 0.005
## 5     E  0.3 400 0.010 0.100 0.003
## 6     F -0.2 200 0.020 0.141 0.157
# Figure 11.1

res <- rma(md, var, data=dat, method="FE", slab=study)

tmp <- dat[-1]
tmp$se  <- fc(tmp$se,  3)
tmp$var <- fc(tmp$var, 3)

size <- sqrt(weights(res))
size <- 2.5 * size / max(size)

par(mar=c(4,4,2,2))

forest(res, xlim=c(-6.5,1.4), psize=size, header=TRUE, mlab="Combined",
       efac=c(0,1,2), annotate=FALSE, xlab="Standardized Mean Difference",
       ilab=tmp, ilab.xpos=c(-5.0, -4.1, -3.2, -2.3, -1.4))
text(-5.0, 8, "Mean\nDifference", font=2)
text(-4.1, 8, "Sample\nSize", font=2)
text(-3.2, 8, "Variance", font=2)
text(-2.3, 8, "Standard\nError", font=2)
text(-1.4, 8, "p-Value", font=2)