Convert Wald-Type Confidence Intervals and Tests to Sampling Variances

Function to convert Wald-type confidence intervals (CIs) and test statistics (or the corresponding p-values) to sampling variances.

conv.wald(out, ci.lb, ci.ub, zval, pval, n, data, include,
          level=95, transf, check=TRUE, var.names, append=TRUE, replace="ifna", ...)

Arguments

out: vector with the observed effect sizes or outcomes.
ci.lb: vector with the lower bounds of the corresponding Wald-type CIs.
ci.ub: vector with the upper bounds of the corresponding Wald-type CIs.
zval: vector with the Wald-type test statistics.
pval: vector with the p-values of the Wald-type tests.
n: vector with the total sample sizes of the studies.
data: optional data frame containing the variables given to the arguments above.
include: optional (logical or numeric) vector to specify the subset of studies for which the conversion should be carried out.
level: numeric value (or vector) to specify the confidence interval level(s) (the default is 95; see here for details).
transf: optional argument to specify a function to transform out, ci.lb, and ci.ub (e.g., transf=log). If unspecified, no transformation is used.
check: logical to specify whether the function should carry out a check to examine if the point estimates fall (approximately) halfway between the CI bounds (the default is TRUE).
var.names: character vector with two elements to specify the name of the variable for the observed effect sizes or outcomes and the name of the variable for the corresponding sampling variances (if data is an object of class "escalc", the var.names are taken from the object; otherwise the defaults are "yi" and "vi").
append: logical to specify whether the data frame provided via the data argument should be returned together with the estimated values (the default is TRUE).
replace: character string or logical to specify how values in var.names should be replaced (only relevant when using the data argument and if variables in var.names already exist in the data frame). See the ‘Value’ section for more details.
...: other arguments.

Details

The escalc function can be used to compute a wide variety of effect sizes or ‘outcome measures’. However, the inputs required to compute certain measures with this function may not be reported for all of the studies. Under certain circumstances, other information (such as point estimates and corresponding confidence intervals and/or test statistics) may be available that can be converted into the appropriate format needed for a meta-analysis. The purpose of the present function is to facilitate this process.

The function typically takes a data frame created with the escalc function as input via the data argument. This object should contain variables yi and vi (unless argument var.names was used to adjust these variable names when the "escalc" object was created) for the observed effect sizes or outcomes and the corresponding sampling variances, respectively. For some studies, the values for these variables may be missing.

Converting Point Estimates and Confidence Intervals

In some studies, the effect size estimate or observed outcome may already be reported. If so, such values can be supplied via the out argument and are then substituted for missing yi values. At times, it may be necessary to transform the reported values (e.g., reported odds ratios to log odds ratios). Via argument transf, an appropriate transformation function can be specified (e.g., transf=log), in which case \(y_i = f(\text{out})\) where \(f(\cdot)\) is the function specified via transf.

Moreover, a confidence interval (CI) may have been reported together with the estimate. The bounds of the CI can be supplied via arguments ci.lb and ci.ub, which are also transformed if a function is specified via transf. Assume that the bounds were obtained from a Wald-type CI of the form \(y_i \pm z_{crit} \sqrt{v_i}\) (on the transformed scale if transf is specified), where \(v_i\) is the sampling variance corresponding to the effect size estimate or observed outcome (so that \(\sqrt{v_i}\) is the corresponding standard error) and \(z_{crit}\) is the appropriate critical value from a standard normal distribution (e.g., \(1.96\) for a 95% CI). Then \[v_i = \left(\frac{\text{ci.ub} - \text{ci.lb}}{2 \times z_{crit}}\right)^2\] is used to back-calculate the sampling variances of the (transformed) effect size estimates or observed outcomes and these values are then substituted for missing vi values in the dataset.

For example, consider the following dataset of three RCTs used as input for a meta-analysis of log odds ratios:


dat <- data.frame(study = 1:3,
                  cases.trt = c(23, NA, 4), n.trt = c(194, 183, 46),
                  cases.plc = c(38, NA, 7), n.plc = c(201, 188, 44),
                  oddsratio = c(NA, 0.64, NA), lower = c(NA, 0.33, NA), upper = c(NA, 1.22, NA))
dat <- escalc(measure="OR", ai=cases.trt, n1i=n.trt, ci=cases.plc, n2i=n.plc, data=dat)
dat

#   study cases.trt n.trt cases.plc n.plc oddsratio lower upper      yi     vi
# 1     1        23   194        38   201        NA    NA    NA -0.5500 0.0818
# 2     2        NA   183        NA   188      0.64  0.33  1.22      NA     NA
# 3     3         4    46         7    44        NA    NA    NA -0.6864 0.4437

where variable yi contains the log odds ratios and vi the corresponding sampling variances as computed from the counts and group sizes by escalc().

Study 2 does not report the counts (or sufficient information to reconstruct them), but the odds ratio and a corresponding 95% confidence interval (CI) directly, as given by variables oddsratio, lower, and upper. The CI is a standard Wald-type CI that was computed on the log scale (and whose bounds were then exponentiated). Then the present function can be used as follows:


dat <- conv.wald(out=oddsratio, ci.lb=lower, ci.ub=upper, data=dat, transf=log)
dat

#   study cases.trt n.trt cases.plc n.plc oddsratio lower upper      yi     vi
# 1     1        23   194        38   201        NA    NA    NA -0.5500 0.0818
# 2     2        NA   183        NA   188      0.64  0.33  1.22 -0.4463 0.1113
# 3     3         4    46         7    44        NA    NA    NA -0.6864 0.4437

Now variables yi and vi in the dataset are complete.

If the CI was not a 95% CI, then one can specify the appropriate level via the level argument. This can also be an entire vector in case different studies used different levels.

By default (i.e., when check=TRUE), the function carries out a rough check to examine if the point estimate falls (approximately) halfway between the CI bounds (on the transformed scale) for each study for which the conversion was carried out. A warning is issued if there are studies where this is not the case. This may indicate that a particular CI was not a Wald-type CI or was computed on a different scale (in which case the back-calculation above would be inappropriate), but can also arise due to rounding of the reported values (in which case the back-calculation would still be appropriate, albeit possibly a bit inaccurate). Care should be taken when using such back-calculated values in a meta-analysis.

Converting Test Statistics and P-Values

Similarly, study authors may report the test statistic and/or p-value from a Wald-type test of the form \(\text{zval} = y_i / \sqrt{v_i}\) (on the transformed scale if transf is specified), with the corresponding two-sided p-value given by \(\text{pval} = 2(1 - \Phi(\text{|zval|}))\), where \(\Phi(\cdot)\) denotes the cumulative distribution function of a standard normal distribution (i.e., pnorm). Test statistics and/or corresponding p-values of this form can be supplied via arguments zval and pval.

A given p-value can be back-transformed into the corresponding test statistic (if it is not already available) with \(\text{zval} = \Phi^{-1}(1 - \text{pval}/2)\), where \(\Phi^{-1}(\cdot)\) denotes the quantile function (i.e., the inverse of the cumulative distribution function) of a standard normal distribution (i.e., qnorm). Then \[v_i = \left(\frac{y_i}{\text{zval}}\right)^2\] is used to back-calculate a missing vi value in the dataset.

Note that the conversion of a p-value to the corresponding test statistic (which is then converted into sampling variance) as shown above assumes that the exact p-value is reported. If authors only report that the p-value fell below a certain threshold (e.g., \(p < .01\) or if authors only state that the test was significant – which typically implies \(p < .05\)), then a common approach is to use the value of the cutoff reported (e.g., if \(p < .01\) is reported, then assume \(p = .01\)), which is conservative (since the actual p-value was below that assumed value by some unknown amount). The conversion will therefore tend to be much less accurate.

Using the earlier example, suppose that only the odds ratio and the corresponding two-sided p-value from a Wald-type test (whether the log odds ratio differs significantly from zero) is reported for study 2.


dat <- data.frame(study = 1:3,
                  cases.trt = c(23, NA, 4), n.trt = c(194, 183, 46),
                  cases.plc = c(38, NA, 7), n.plc = c(201, 188, 44),
                  oddsratio = c(NA, 0.64, NA), pval = c(NA, 0.17, NA))
dat <- escalc(measure="OR", ai=cases.trt, n1i=n.trt, ci=cases.plc, n2i=n.plc, data=dat)
dat

  study cases.trt n.trt cases.plc n.plc oddsratio pval      yi     vi
1     1        23   194        38   201        NA   NA -0.5500 0.0818
2     2        NA   183        NA   188      0.64 0.17      NA     NA
3     3         4    46         7    44        NA   NA -0.6864 0.4437

Then the function can be used as follows:


dat <- conv.wald(out=oddsratio, pval=pval, data=dat, transf=log)
dat

#   study cases.trt n.trt cases.plc n.plc oddsratio pval      yi     vi
# 1     1        23   194        38   201        NA   NA -0.5500 0.0818
# 2     2        NA   183        NA   188      0.64 0.17 -0.4463 0.1058
# 3     3         4    46         7    44        NA   NA -0.6864 0.4437

Note that the back-calculated sampling variance for study 2 is not identical in these two examples, because the CI bounds and p-value are rounded to two decimal places, which introduces some inaccuracies. Also, if both (ci.lb, ci.ub) and either zval or pval is available for a study, then the back-calculation of \(v_i\) via the confidence interval is preferred.

Optionally, one can use the n argument to supply the total sample sizes of the studies. This has no relevance for the calculations done by the present function, but some other functions may use this information (e.g., when drawing a funnel plot with the funnel function and one adjusts the yaxis argument to one of the options that puts the sample sizes or some transformation thereof on the y-axis).

Value

If the data argument was not specified or append=FALSE, a data frame of class c("escalc","data.frame") with two variables called var.names[1] (by default "yi") and var.names[2] (by default "vi") with the (transformed) observed effect sizes or outcomes and the corresponding sampling variances (computed as described above).

If data was specified and append=TRUE, then the original data frame is returned. If var.names[1] is a variable in data and replace="ifna" (or replace=FALSE), then only missing values in this variable are replaced with the (possibly transformed) observed effect sizes or outcomes from out (where possible) and otherwise a new variable called var.names[1] is added to the data frame. Similarly, if var.names[2] is a variable in data and replace="ifna" (or replace=FALSE), then only missing values in this variable are replaced with the sampling variances back-calculated as described above (where possible) and otherwise a new variable called var.names[2] is added to the data frame.

If replace="all" (or replace=TRUE), then all values in var.names[1] and var.names[2] are replaced, even for cases where the value in var.names[1] and var.names[2] is not missing.

Note

A word of caution: Except for the check on the CI bounds, there is no possibility to determine if the back-calculations done by the function are appropriate in a given context. They are only appropriate when the CI bounds and tests statistics (or p-values) arose from Wald-type CIs / tests as described above. Using the same back-calculations for other purposes is likely to yield nonsensical values.

Author

Wolfgang Viechtbauer (wvb@metafor-project.org, https://www.metafor-project.org).

References

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03

Examples

### a very simple example
dat <- data.frame(or=c(1.37,1.89), or.lb=c(1.03,1.60), or.ub=c(1.82,2.23))
dat
#>     or or.lb or.ub
#> 1 1.37  1.03  1.82
#> 2 1.89  1.60  2.23

### convert the odds ratios and CIs into log odds ratios with corresponding sampling variances
dat <- conv.wald(out=or, ci.lb=or.lb, ci.ub=or.ub, data=dat, transf=log)
dat
#> 
#>     or or.lb or.ub     yi     vi 
#> 1 1.37  1.03  1.82 0.3148 0.0211 
#> 2 1.89  1.60  2.23 0.6366 0.0072 
#> 

############################################################################

### a more elaborate example based on the BCG vaccine dataset
dat <- dat.bcg[,c(2:7)]
dat
#>                  author year tpos  tneg cpos  cneg
#> 1               Aronson 1948    4   119   11   128
#> 2      Ferguson & Simes 1949    6   300   29   274
#> 3       Rosenthal et al 1960    3   228   11   209
#> 4     Hart & Sutherland 1977   62 13536  248 12619
#> 5  Frimodt-Moller et al 1973   33  5036   47  5761
#> 6       Stein & Aronson 1953  180  1361  372  1079
#> 7      Vandiviere et al 1973    8  2537   10   619
#> 8            TPT Madras 1980  505 87886  499 87892
#> 9      Coetzee & Berjak 1968   29  7470   45  7232
#> 10      Rosenthal et al 1961   17  1699   65  1600
#> 11       Comstock et al 1974  186 50448  141 27197
#> 12   Comstock & Webster 1969    5  2493    3  2338
#> 13       Comstock et al 1976   27 16886   29 17825

### with complete data, we can use escalc() in the usual way
dat1 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)
dat1
#> 
#>                  author year tpos  tneg cpos  cneg      yi     vi 
#> 1               Aronson 1948    4   119   11   128 -0.9387 0.3571 
#> 2      Ferguson & Simes 1949    6   300   29   274 -1.6662 0.2081 
#> 3       Rosenthal et al 1960    3   228   11   209 -1.3863 0.4334 
#> 4     Hart & Sutherland 1977   62 13536  248 12619 -1.4564 0.0203 
#> 5  Frimodt-Moller et al 1973   33  5036   47  5761 -0.2191 0.0520 
#> 6       Stein & Aronson 1953  180  1361  372  1079 -0.9581 0.0099 
#> 7      Vandiviere et al 1973    8  2537   10   619 -1.6338 0.2270 
#> 8            TPT Madras 1980  505 87886  499 87892  0.0120 0.0040 
#> 9      Coetzee & Berjak 1968   29  7470   45  7232 -0.4717 0.0570 
#> 10      Rosenthal et al 1961   17  1699   65  1600 -1.4012 0.0754 
#> 11       Comstock et al 1974  186 50448  141 27197 -0.3408 0.0125 
#> 12   Comstock & Webster 1969    5  2493    3  2338  0.4466 0.5342 
#> 13       Comstock et al 1976   27 16886   29 17825 -0.0173 0.0716 
#> 

### random-effects model fitted to these data
res1 <- rma(yi, vi, data=dat1)
res1
#> 
#> Random-Effects Model (k = 13; tau^2 estimator: REML)
#> 
#> tau^2 (estimated amount of total heterogeneity): 0.3378 (SE = 0.1784)
#> tau (square root of estimated tau^2 value):      0.5812
#> I^2 (total heterogeneity / total variability):   92.07%
#> H^2 (total variability / sampling variability):  12.61
#> 
#> Test for Heterogeneity:
#> Q(df = 12) = 163.1649, p-val < .0001
#> 
#> Model Results:
#> 
#> estimate      se     zval    pval    ci.lb    ci.ub      
#>  -0.7452  0.1860  -4.0057  <.0001  -1.1098  -0.3806  *** 
#> 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 

### now suppose that the 2x2 table data are not reported in all studies, but that the
### following dataset could be assembled based on information reported in the studies
dat2 <- data.frame(summary(dat1))
dat2[c("yi", "ci.lb", "ci.ub")] <- data.frame(summary(dat1, transf=exp))[c("yi", "ci.lb", "ci.ub")]
names(dat2)[which(names(dat2) == "yi")] <- "or"
dat2[,c("or","ci.lb","ci.ub","pval")] <- round(dat2[,c("or","ci.lb","ci.ub","pval")], digits=2)
dat2$vi <- dat2$sei <- dat2$zi <- NULL
dat2$ntot <- with(dat2, tpos + tneg + cpos + cneg)
dat2[c(1,12),c(3:6,9:10)] <- NA
dat2[c(4,9), c(3:6,8)] <- NA
dat2[c(2:3,5:8,10:11,13),c(7:10)] <- NA
dat2$ntot[!is.na(dat2$tpos)] <- NA
dat2
#>                  author year tpos  tneg cpos  cneg   or pval ci.lb ci.ub  ntot
#> 1               Aronson 1948   NA    NA   NA    NA 0.39 0.12    NA    NA   262
#> 2      Ferguson & Simes 1949    6   300   29   274   NA   NA    NA    NA    NA
#> 3       Rosenthal et al 1960    3   228   11   209   NA   NA    NA    NA    NA
#> 4     Hart & Sutherland 1977   NA    NA   NA    NA 0.23   NA  0.18  0.31 26465
#> 5  Frimodt-Moller et al 1973   33  5036   47  5761   NA   NA    NA    NA    NA
#> 6       Stein & Aronson 1953  180  1361  372  1079   NA   NA    NA    NA    NA
#> 7      Vandiviere et al 1973    8  2537   10   619   NA   NA    NA    NA    NA
#> 8            TPT Madras 1980  505 87886  499 87892   NA   NA    NA    NA    NA
#> 9      Coetzee & Berjak 1968   NA    NA   NA    NA 0.62   NA  0.39  1.00 14776
#> 10      Rosenthal et al 1961   17  1699   65  1600   NA   NA    NA    NA    NA
#> 11       Comstock et al 1974  186 50448  141 27197   NA   NA    NA    NA    NA
#> 12   Comstock & Webster 1969   NA    NA   NA    NA 1.56 0.54    NA    NA  4839
#> 13       Comstock et al 1976   27 16886   29 17825   NA   NA    NA    NA    NA

### in studies 1 and 12, authors reported only the odds ratio and the corresponding p-value
### in studies 4 and 9, authors reported only the odds ratio and the corresponding 95% CI

### use escalc() first
dat2 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat2)
dat2
#> 
#>                  author year tpos  tneg cpos  cneg   or pval ci.lb ci.ub  ntot      yi     vi 
#> 1               Aronson 1948   NA    NA   NA    NA 0.39 0.12    NA    NA   262      NA     NA 
#> 2      Ferguson & Simes 1949    6   300   29   274   NA   NA    NA    NA    NA -1.6662 0.2081 
#> 3       Rosenthal et al 1960    3   228   11   209   NA   NA    NA    NA    NA -1.3863 0.4334 
#> 4     Hart & Sutherland 1977   NA    NA   NA    NA 0.23   NA  0.18  0.31 26465      NA     NA 
#> 5  Frimodt-Moller et al 1973   33  5036   47  5761   NA   NA    NA    NA    NA -0.2191 0.0520 
#> 6       Stein & Aronson 1953  180  1361  372  1079   NA   NA    NA    NA    NA -0.9581 0.0099 
#> 7      Vandiviere et al 1973    8  2537   10   619   NA   NA    NA    NA    NA -1.6338 0.2270 
#> 8            TPT Madras 1980  505 87886  499 87892   NA   NA    NA    NA    NA  0.0120 0.0040 
#> 9      Coetzee & Berjak 1968   NA    NA   NA    NA 0.62   NA  0.39  1.00 14776      NA     NA 
#> 10      Rosenthal et al 1961   17  1699   65  1600   NA   NA    NA    NA    NA -1.4012 0.0754 
#> 11       Comstock et al 1974  186 50448  141 27197   NA   NA    NA    NA    NA -0.3408 0.0125 
#> 12   Comstock & Webster 1969   NA    NA   NA    NA 1.56 0.54    NA    NA  4839      NA     NA 
#> 13       Comstock et al 1976   27 16886   29 17825   NA   NA    NA    NA    NA -0.0173 0.0716 
#> 

### fill in the missing log odds ratios and sampling variances
dat2 <- conv.wald(out=or, ci.lb=ci.lb, ci.ub=ci.ub, pval=pval, n=ntot, data=dat2, transf=log)
dat2
#> 
#>                  author year tpos  tneg cpos  cneg   or pval ci.lb ci.ub  ntot      yi     vi 
#> 1               Aronson 1948   NA    NA   NA    NA 0.39 0.12    NA    NA   262 -0.9416 0.3668 
#> 2      Ferguson & Simes 1949    6   300   29   274   NA   NA    NA    NA    NA -1.6662 0.2081 
#> 3       Rosenthal et al 1960    3   228   11   209   NA   NA    NA    NA    NA -1.3863 0.4334 
#> 4     Hart & Sutherland 1977   NA    NA   NA    NA 0.23   NA  0.18  0.31 26465 -1.4697 0.0192 
#> 5  Frimodt-Moller et al 1973   33  5036   47  5761   NA   NA    NA    NA    NA -0.2191 0.0520 
#> 6       Stein & Aronson 1953  180  1361  372  1079   NA   NA    NA    NA    NA -0.9581 0.0099 
#> 7      Vandiviere et al 1973    8  2537   10   619   NA   NA    NA    NA    NA -1.6338 0.2270 
#> 8            TPT Madras 1980  505 87886  499 87892   NA   NA    NA    NA    NA  0.0120 0.0040 
#> 9      Coetzee & Berjak 1968   NA    NA   NA    NA 0.62   NA  0.39  1.00 14776 -0.4780 0.0577 
#> 10      Rosenthal et al 1961   17  1699   65  1600   NA   NA    NA    NA    NA -1.4012 0.0754 
#> 11       Comstock et al 1974  186 50448  141 27197   NA   NA    NA    NA    NA -0.3408 0.0125 
#> 12   Comstock & Webster 1969   NA    NA   NA    NA 1.56 0.54    NA    NA  4839  0.4447 0.5266 
#> 13       Comstock et al 1976   27 16886   29 17825   NA   NA    NA    NA    NA -0.0173 0.0716 
#> 

### random-effects model fitted to these data
res2 <- rma(yi, vi, data=dat2)
res2
#> 
#> Random-Effects Model (k = 13; tau^2 estimator: REML)
#> 
#> tau^2 (estimated amount of total heterogeneity): 0.3408 (SE = 0.1798)
#> tau (square root of estimated tau^2 value):      0.5838
#> I^2 (total heterogeneity / total variability):   92.18%
#> H^2 (total variability / sampling variability):  12.80
#> 
#> Test for Heterogeneity:
#> Q(df = 12) = 167.4513, p-val < .0001
#> 
#> Model Results:
#> 
#> estimate      se     zval    pval    ci.lb    ci.ub      
#>  -0.7472  0.1867  -4.0015  <.0001  -1.1132  -0.3812  *** 
#> 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 

### any differences between res1 and res2 are a result of or, ci.lb, ci.ub, and pval being
### rounded in dat2 to two decimal places; without rounding, the results would be identical