conv.wald.Rd
Function to convert Wald-type confidence intervals (CIs) and test statistics (or the corresponding p-values) to sampling variances.
conv.wald(out, ci.lb, ci.ub, zval, pval, n, data, include,
level=95, transf, check=TRUE, var.names, append=TRUE, replace="ifna", ...)
vector with the observed effect sizes or outcomes.
vector with the lower bounds of the corresponding Wald-type CIs.
vector with the upper bounds of the corresponding Wald-type CIs.
vector with the Wald-type test statistics.
vector with the p-values of the Wald-type tests.
vector with the total sample sizes of the studies.
optional data frame containing the variables given to the arguments above.
optional (logical or numeric) vector to specify the subset of studies for which the conversion should be carried out.
numeric value (or vector) to specify the confidence interval level(s) (the default is 95; see here for details).
optional argument to specify a function to transform out
, ci.lb
, and ci.ub
(e.g., transf=log
). If unspecified, no transformation is used.
logical to specify whether the function should carry out a check to examine if the point estimates fall (approximately) halfway between the CI bounds (the default is TRUE
).
character vector with two elements to specify the name of the variable for the observed effect sizes or outcomes and the name of the variable for the corresponding sampling variances (if data
is an object of class "escalc"
, the var.names
are taken from the object; otherwise the defaults are "yi"
and "vi"
).
logical to specify whether the data frame provided via the data
argument should be returned together with the estimated values (the default is TRUE
).
character string or logical to specify how values in var.names
should be replaced (only relevant when using the data
argument and if variables in var.names
already exist in the data frame). See the ‘Value’ section for more details.
other arguments.
The escalc
function can be used to compute a wide variety of effect sizes or ‘outcome measures’. However, the inputs required to compute certain measures with this function may not be reported for all of the studies. Under certain circumstances, other information (such as point estimates and corresponding confidence intervals and/or test statistics) may be available that can be converted into the appropriate format needed for a meta-analysis. The purpose of the present function is to facilitate this process.
The function typically takes a data frame created with the escalc
function as input via the data
argument. This object should contain variables yi
and vi
(unless argument var.names
was used to adjust these variable names when the "escalc"
object was created) for the observed effect sizes or outcomes and the corresponding sampling variances, respectively. For some studies, the values for these variables may be missing.
In some studies, the effect size estimate or observed outcome may already be reported. If so, such values can be supplied via the out
argument and are then substituted for missing yi
values. At times, it may be necessary to transform the reported values (e.g., reported odds ratios to log odds ratios). Via argument transf
, an appropriate transformation function can be specified (e.g., transf=log
), in which case \(y_i = f(\text{out})\) where \(f(\cdot)\) is the function specified via transf
.
Moreover, a confidence interval (CI) may have been reported together with the estimate. The bounds of the CI can be supplied via arguments ci.lb
and ci.ub
, which are also transformed if a function is specified via transf
. Assume that the bounds were obtained from a Wald-type CI of the form \(y_i \pm z_{crit} \sqrt{v_i}\) (on the transformed scale if transf
is specified), where \(v_i\) is the sampling variance corresponding to the effect size estimate or observed outcome (so that \(\sqrt{v_i}\) is the corresponding standard error) and \(z_{crit}\) is the appropriate critical value from a standard normal distribution (e.g., \(1.96\) for a 95% CI). Then \[v_i = \left(\frac{\text{ci.ub} - \text{ci.lb}}{2 \times z_{crit}}\right)^2\] is used to back-calculate the sampling variances of the (transformed) effect size estimates or observed outcomes and these values are then substituted for missing vi
values in the dataset.
For example, consider the following dataset of three RCTs used as input for a meta-analysis of log odds ratios:
dat <- data.frame(study = 1:3,
cases.trt = c(23, NA, 4), n.trt = c(194, 183, 46),
cases.plc = c(38, NA, 7), n.plc = c(201, 188, 44),
oddsratio = c(NA, 0.64, NA), lower = c(NA, 0.33, NA), upper = c(NA, 1.22, NA))
dat <- escalc(measure="OR", ai=cases.trt, n1i=n.trt, ci=cases.plc, n2i=n.plc, data=dat)
dat
# study cases.trt n.trt cases.plc n.plc oddsratio lower upper yi vi
# 1 1 23 194 38 201 NA NA NA -0.5500 0.0818
# 2 2 NA 183 NA 188 0.64 0.33 1.22 NA NA
# 3 3 4 46 7 44 NA NA NA -0.6864 0.4437
where variable yi
contains the log odds ratios and vi
the corresponding sampling variances as computed from the counts and group sizes by escalc()
.
Study 2 does not report the counts (or sufficient information to reconstruct them), but the odds ratio and a corresponding 95% confidence interval (CI) directly, as given by variables oddsratio
, lower
, and upper
. The CI is a standard Wald-type CI that was computed on the log scale (and whose bounds were then exponentiated). Then the present function can be used as follows:
dat <- conv.wald(out=oddsratio, ci.lb=lower, ci.ub=upper, data=dat, transf=log)
dat
# study cases.trt n.trt cases.plc n.plc oddsratio lower upper yi vi
# 1 1 23 194 38 201 NA NA NA -0.5500 0.0818
# 2 2 NA 183 NA 188 0.64 0.33 1.22 -0.4463 0.1113
# 3 3 4 46 7 44 NA NA NA -0.6864 0.4437
Now variables yi
and vi
in the dataset are complete.
If the CI was not a 95% CI, then one can specify the appropriate level via the level
argument. This can also be an entire vector in case different studies used different levels.
By default (i.e., when check=TRUE
), the function carries out a rough check to examine if the point estimate falls (approximately) halfway between the CI bounds (on the transformed scale) for each study for which the conversion was carried out. A warning is issued if there are studies where this is not the case. This may indicate that a particular CI was not a Wald-type CI or was computed on a different scale (in which case the back-calculation above would be inappropriate), but can also arise due to rounding of the reported values (in which case the back-calculation would still be appropriate, albeit possibly a bit inaccurate). Care should be taken when using such back-calculated values in a meta-analysis.
Similarly, study authors may report the test statistic and/or p-value from a Wald-type test of the form \(\text{zval} = y_i / \sqrt{v_i}\) (on the transformed scale if transf
is specified), with the corresponding two-sided p-value given by \(\text{pval} = 2(1 - \Phi(\text{|zval|}))\), where \(\Phi(\cdot)\) denotes the cumulative distribution function of a standard normal distribution (i.e., pnorm
). Test statistics and/or corresponding p-values of this form can be supplied via arguments zval
and pval
.
A given p-value can be back-transformed into the corresponding test statistic (if it is not already available) with \(\text{zval} = \Phi^{-1}(1 - \text{pval}/2)\), where \(\Phi^{-1}(\cdot)\) denotes the quantile function (i.e., the inverse of the cumulative distribution function) of a standard normal distribution (i.e., qnorm
). Then \[v_i = \left(\frac{y_i}{\text{zval}}\right)^2\] is used to back-calculate a missing vi
value in the dataset.
Note that the conversion of a p-value to the corresponding test statistic (which is then converted into sampling variance) as shown above assumes that the exact p-value is reported. If authors only report that the p-value fell below a certain threshold (e.g., \(p < .01\) or if authors only state that the test was significant – which typically implies \(p < .05\)), then a common approach is to use the value of the cutoff reported (e.g., if \(p < .01\) is reported, then assume \(p = .01\)), which is conservative (since the actual p-value was below that assumed value by some unknown amount). The conversion will therefore tend to be much less accurate.
Using the earlier example, suppose that only the odds ratio and the corresponding two-sided p-value from a Wald-type test (whether the log odds ratio differs significantly from zero) is reported for study 2.
dat <- data.frame(study = 1:3,
cases.trt = c(23, NA, 4), n.trt = c(194, 183, 46),
cases.plc = c(38, NA, 7), n.plc = c(201, 188, 44),
oddsratio = c(NA, 0.64, NA), pval = c(NA, 0.17, NA))
dat <- escalc(measure="OR", ai=cases.trt, n1i=n.trt, ci=cases.plc, n2i=n.plc, data=dat)
dat
study cases.trt n.trt cases.plc n.plc oddsratio pval yi vi
1 1 23 194 38 201 NA NA -0.5500 0.0818
2 2 NA 183 NA 188 0.64 0.17 NA NA
3 3 4 46 7 44 NA NA -0.6864 0.4437
Then the function can be used as follows:
dat <- conv.wald(out=oddsratio, pval=pval, data=dat, transf=log)
dat
# study cases.trt n.trt cases.plc n.plc oddsratio pval yi vi
# 1 1 23 194 38 201 NA NA -0.5500 0.0818
# 2 2 NA 183 NA 188 0.64 0.17 -0.4463 0.1058
# 3 3 4 46 7 44 NA NA -0.6864 0.4437
Note that the back-calculated sampling variance for study 2 is not identical in these two examples, because the CI bounds and p-value are rounded to two decimal places, which introduces some inaccuracies. Also, if both (ci.lb
, ci.ub
) and either zval
or pval
is available for a study, then the back-calculation of \(v_i\) via the confidence interval is preferred.
Optionally, one can use the n
argument to supply the total sample sizes of the studies. This has no relevance for the calculations done by the present function, but some other functions may use this information (e.g., when drawing a funnel plot with the funnel
function and one adjusts the yaxis
argument to one of the options that puts the sample sizes or some transformation thereof on the y-axis).
If the data
argument was not specified or append=FALSE
, a data frame of class c("escalc","data.frame")
with two variables called var.names[1]
(by default "yi"
) and var.names[2]
(by default "vi"
) with the (transformed) observed effect sizes or outcomes and the corresponding sampling variances (computed as described above).
If data
was specified and append=TRUE
, then the original data frame is returned. If var.names[1]
is a variable in data
and replace="ifna"
(or replace=FALSE
), then only missing values in this variable are replaced with the (possibly transformed) observed effect sizes or outcomes from out
(where possible) and otherwise a new variable called var.names[1]
is added to the data frame. Similarly, if var.names[2]
is a variable in data
and replace="ifna"
(or replace=FALSE
), then only missing values in this variable are replaced with the sampling variances back-calculated as described above (where possible) and otherwise a new variable called var.names[2]
is added to the data frame.
If replace="all"
(or replace=TRUE
), then all values in var.names[1]
and var.names[2]
are replaced, even for cases where the value in var.names[1]
and var.names[2]
is not missing.
A word of caution: Except for the check on the CI bounds, there is no possibility to determine if the back-calculations done by the function are appropriate in a given context. They are only appropriate when the CI bounds and tests statistics (or p-values) arose from Wald-type CIs / tests as described above. Using the same back-calculations for other purposes is likely to yield nonsensical values.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03
escalc
for a function to compute various effect size measures.
### a very simple example
dat <- data.frame(or=c(1.37,1.89), or.lb=c(1.03,1.60), or.ub=c(1.82,2.23))
dat
#> or or.lb or.ub
#> 1 1.37 1.03 1.82
#> 2 1.89 1.60 2.23
### convert the odds ratios and CIs into log odds ratios with corresponding sampling variances
dat <- conv.wald(out=or, ci.lb=or.lb, ci.ub=or.ub, data=dat, transf=log)
dat
#>
#> or or.lb or.ub yi vi
#> 1 1.37 1.03 1.82 0.3148 0.0211
#> 2 1.89 1.60 2.23 0.6366 0.0072
#>
############################################################################
### a more elaborate example based on the BCG vaccine dataset
dat <- dat.bcg[,c(2:7)]
dat
#> author year tpos tneg cpos cneg
#> 1 Aronson 1948 4 119 11 128
#> 2 Ferguson & Simes 1949 6 300 29 274
#> 3 Rosenthal et al 1960 3 228 11 209
#> 4 Hart & Sutherland 1977 62 13536 248 12619
#> 5 Frimodt-Moller et al 1973 33 5036 47 5761
#> 6 Stein & Aronson 1953 180 1361 372 1079
#> 7 Vandiviere et al 1973 8 2537 10 619
#> 8 TPT Madras 1980 505 87886 499 87892
#> 9 Coetzee & Berjak 1968 29 7470 45 7232
#> 10 Rosenthal et al 1961 17 1699 65 1600
#> 11 Comstock et al 1974 186 50448 141 27197
#> 12 Comstock & Webster 1969 5 2493 3 2338
#> 13 Comstock et al 1976 27 16886 29 17825
### with complete data, we can use escalc() in the usual way
dat1 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)
dat1
#>
#> author year tpos tneg cpos cneg yi vi
#> 1 Aronson 1948 4 119 11 128 -0.9387 0.3571
#> 2 Ferguson & Simes 1949 6 300 29 274 -1.6662 0.2081
#> 3 Rosenthal et al 1960 3 228 11 209 -1.3863 0.4334
#> 4 Hart & Sutherland 1977 62 13536 248 12619 -1.4564 0.0203
#> 5 Frimodt-Moller et al 1973 33 5036 47 5761 -0.2191 0.0520
#> 6 Stein & Aronson 1953 180 1361 372 1079 -0.9581 0.0099
#> 7 Vandiviere et al 1973 8 2537 10 619 -1.6338 0.2270
#> 8 TPT Madras 1980 505 87886 499 87892 0.0120 0.0040
#> 9 Coetzee & Berjak 1968 29 7470 45 7232 -0.4717 0.0570
#> 10 Rosenthal et al 1961 17 1699 65 1600 -1.4012 0.0754
#> 11 Comstock et al 1974 186 50448 141 27197 -0.3408 0.0125
#> 12 Comstock & Webster 1969 5 2493 3 2338 0.4466 0.5342
#> 13 Comstock et al 1976 27 16886 29 17825 -0.0173 0.0716
#>
### random-effects model fitted to these data
res1 <- rma(yi, vi, data=dat1)
res1
#>
#> Random-Effects Model (k = 13; tau^2 estimator: REML)
#>
#> tau^2 (estimated amount of total heterogeneity): 0.3378 (SE = 0.1784)
#> tau (square root of estimated tau^2 value): 0.5812
#> I^2 (total heterogeneity / total variability): 92.07%
#> H^2 (total variability / sampling variability): 12.61
#>
#> Test for Heterogeneity:
#> Q(df = 12) = 163.1649, p-val < .0001
#>
#> Model Results:
#>
#> estimate se zval pval ci.lb ci.ub
#> -0.7452 0.1860 -4.0057 <.0001 -1.1098 -0.3806 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### now suppose that the 2x2 table data are not reported in all studies, but that the
### following dataset could be assembled based on information reported in the studies
dat2 <- data.frame(summary(dat1))
dat2[c("yi", "ci.lb", "ci.ub")] <- data.frame(summary(dat1, transf=exp))[c("yi", "ci.lb", "ci.ub")]
names(dat2)[which(names(dat2) == "yi")] <- "or"
dat2[,c("or","ci.lb","ci.ub","pval")] <- round(dat2[,c("or","ci.lb","ci.ub","pval")], digits=2)
dat2$vi <- dat2$sei <- dat2$zi <- NULL
dat2$ntot <- with(dat2, tpos + tneg + cpos + cneg)
dat2[c(1,12),c(3:6,9:10)] <- NA
dat2[c(4,9), c(3:6,8)] <- NA
dat2[c(2:3,5:8,10:11,13),c(7:10)] <- NA
dat2$ntot[!is.na(dat2$tpos)] <- NA
dat2
#> author year tpos tneg cpos cneg or pval ci.lb ci.ub ntot
#> 1 Aronson 1948 NA NA NA NA 0.39 0.12 NA NA 262
#> 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA NA
#> 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA NA
#> 4 Hart & Sutherland 1977 NA NA NA NA 0.23 NA 0.18 0.31 26465
#> 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA NA
#> 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA NA
#> 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA NA
#> 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA NA
#> 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 NA 0.39 1.00 14776
#> 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA NA
#> 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA NA
#> 12 Comstock & Webster 1969 NA NA NA NA 1.56 0.54 NA NA 4839
#> 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA NA
### in studies 1 and 12, authors reported only the odds ratio and the corresponding p-value
### in studies 4 and 9, authors reported only the odds ratio and the corresponding 95% CI
### use escalc() first
dat2 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat2)
dat2
#>
#> author year tpos tneg cpos cneg or pval ci.lb ci.ub ntot yi vi
#> 1 Aronson 1948 NA NA NA NA 0.39 0.12 NA NA 262 NA NA
#> 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA NA -1.6662 0.2081
#> 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA NA -1.3863 0.4334
#> 4 Hart & Sutherland 1977 NA NA NA NA 0.23 NA 0.18 0.31 26465 NA NA
#> 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA NA -0.2191 0.0520
#> 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA NA -0.9581 0.0099
#> 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA NA -1.6338 0.2270
#> 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA NA 0.0120 0.0040
#> 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 NA 0.39 1.00 14776 NA NA
#> 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA NA -1.4012 0.0754
#> 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA NA -0.3408 0.0125
#> 12 Comstock & Webster 1969 NA NA NA NA 1.56 0.54 NA NA 4839 NA NA
#> 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA NA -0.0173 0.0716
#>
### fill in the missing log odds ratios and sampling variances
dat2 <- conv.wald(out=or, ci.lb=ci.lb, ci.ub=ci.ub, pval=pval, n=ntot, data=dat2, transf=log)
dat2
#>
#> author year tpos tneg cpos cneg or pval ci.lb ci.ub ntot yi vi
#> 1 Aronson 1948 NA NA NA NA 0.39 0.12 NA NA 262 -0.9416 0.3668
#> 2 Ferguson & Simes 1949 6 300 29 274 NA NA NA NA NA -1.6662 0.2081
#> 3 Rosenthal et al 1960 3 228 11 209 NA NA NA NA NA -1.3863 0.4334
#> 4 Hart & Sutherland 1977 NA NA NA NA 0.23 NA 0.18 0.31 26465 -1.4697 0.0192
#> 5 Frimodt-Moller et al 1973 33 5036 47 5761 NA NA NA NA NA -0.2191 0.0520
#> 6 Stein & Aronson 1953 180 1361 372 1079 NA NA NA NA NA -0.9581 0.0099
#> 7 Vandiviere et al 1973 8 2537 10 619 NA NA NA NA NA -1.6338 0.2270
#> 8 TPT Madras 1980 505 87886 499 87892 NA NA NA NA NA 0.0120 0.0040
#> 9 Coetzee & Berjak 1968 NA NA NA NA 0.62 NA 0.39 1.00 14776 -0.4780 0.0577
#> 10 Rosenthal et al 1961 17 1699 65 1600 NA NA NA NA NA -1.4012 0.0754
#> 11 Comstock et al 1974 186 50448 141 27197 NA NA NA NA NA -0.3408 0.0125
#> 12 Comstock & Webster 1969 NA NA NA NA 1.56 0.54 NA NA 4839 0.4447 0.5266
#> 13 Comstock et al 1976 27 16886 29 17825 NA NA NA NA NA -0.0173 0.0716
#>
### random-effects model fitted to these data
res2 <- rma(yi, vi, data=dat2)
res2
#>
#> Random-Effects Model (k = 13; tau^2 estimator: REML)
#>
#> tau^2 (estimated amount of total heterogeneity): 0.3408 (SE = 0.1798)
#> tau (square root of estimated tau^2 value): 0.5838
#> I^2 (total heterogeneity / total variability): 92.18%
#> H^2 (total variability / sampling variability): 12.80
#>
#> Test for Heterogeneity:
#> Q(df = 12) = 167.4513, p-val < .0001
#>
#> Model Results:
#>
#> estimate se zval pval ci.lb ci.ub
#> -0.7472 0.1867 -4.0015 <.0001 -1.1132 -0.3812 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### any differences between res1 and res2 are a result of or, ci.lb, ci.ub, and pval being
### rounded in dat2 to two decimal places; without rounding, the results would be identical