Calculate the Variance-Covariance of Dependent Correlation Coefficients

Function to calculate the variance-covariance matrix of correlation coefficients computed based on the same sample of subjects.

rcalc(x, ni, data, rtoz=FALSE, nfun="min", sparse=FALSE, ...)

Arguments

x: a formula of the form ri ~ var1 + var2 | study. Can also be a correlation matrix or list thereof. See ‘Details’.
ni: vector to specify the sample sizes based on which the correlations were computed.
data: data frame containing the variables specified via the formula (and the sample sizes).
rtoz: logical to specify whether to transform the correlations via Fisher's r-to-z transformation (the default is FALSE).
nfun: a character string to specify how the ‘common’ sample size within each study should be computed. Possible options are "min" (for the minimum), "harmonic" (for the harmonic mean), or "mean" (for the arithmetic mean). Can also be a function. See ‘Details’.
sparse: logical to specify whether the variance-covariance matrix should be returned as a sparse matrix (the default is FALSE).
...: other arguments.

Details

A meta-analysis of correlation coefficients may involve multiple correlation coefficients extracted from the same study. When these correlations are computed based on the same sample of subjects, then they are typically not independent. The rcalc function can be used to create a dataset with the correlation coefficients (possibly transformed with Fisher's r-to-z transformation) and the corresponding variance-covariance matrix. The dataset and variance-covariance matrix can then be further meta-analyzed using the rma.mv function.

When computing the covariance between two correlation coefficients, we can distinguish two cases:

In the first case, one of the variables involved in the two correlation coefficients is the same. For example, in \(r_{12}\) and \(r_{13}\), variable 1 is common to both correlation coefficients. This is sometimes called the (partially) ‘overlapping’ case. The covariance between the two correlation coefficients, \(\text{Cov}[r_{12}, r_{13}]\), then depends on the degree of correlation between variables 2 and 3 (i.e., \(r_{23}\)).
In the second case, none of the variables are common to both correlation coefficients. For example, this would be the case if we have correlations \(r_{12}\) and \(r_{34}\) based on 4 variables. This is sometimes called the ‘non-overlapping’ case. The covariance between the two correlation coefficients, \(\text{Cov}[r_{12}, r_{34}]\), then depends on \(r_{13}\), \(r_{14}\), \(r_{23}\), and \(r_{24}\).

Equations to compute these covariances can be found, for example, in Steiger (1980) and Olkin and Finn (1990).

To use the rcalc function, one needs to construct a data frame that contains a study identifier (say study), two variable identifiers (say var1 and var2), the corresponding correlation coefficients (say ri), and the sample sizes based on which the correlation coefficients were computed (say ni). Then the first argument should be a formula of the form ri ~ var1 + var2 | study, argument ni is set equal to the variable name containing the sample sizes, and the data frame containing these variables is specified via the data argument. When using the function for a single study, one can leave out the study identifier from the formula.

When argument rtoz is set to TRUE, then the correlations are transformed with Fisher's r-to-z transformation (Fisher, 1921) and the variance-covariance matrix is computed for the transformed values.

In some cases, the sample size may not be identical within a study (e.g., \(r_{12}\) may have been computed based on 120 subjects while \(r_{13}\) was computed based on 118 subjects due to 2 missing values in variable 3). For constructing the variance-covariance matrix, we need to assume a ‘common’ sample size for all correlation coefficients within the study. Argument nfun provides some options for how the common sample size should be computed. Possible options are "min" (for using the minimum sample size within a study as the common sample size), "harmonic" (for using the harmonic mean), or "mean" (for using the arithmetic mean). The default is "min", which is a conservative choice (i.e., it will overestimate the sampling variances of coefficients that were computed based on a sample size that was actually larger than the minimum sample size). One can also specify a function via the nfun argument (which should take a numeric vector as input and return a single value).

Instead of specifying a formula, one can also pass a correlation matrix to the function via argument x. Argument ni then specifies the (common) sample size based on which the elements in the correlation matrix were computed. One can also pass a list of correlation matrices via argument x, in which case argument ni should be a vector of sample sizes of the same length as x.

Value

A list containing the following components:

dat: a data frame with the study identifier, the two variable identifiers, a variable pair identifier, the correlation coefficients (possibly transformed with Fisher's r-to-z transformation), and the (common) sample sizes.
V: corresponding variance-covariance matrix (given as a sparse matrix when sparse=TRUE).

Note that a particular covariance can only be computed when all of the correlation coefficients involved in the covariance equation are included in the dataset. If one or more coefficients needed for the computation are missing, then the resulting covariance will also be missing (i.e., NA).

Note

For raw correlation coefficients, the variance-covariance matrix is computed with \(n-1\) in the denominator (instead of \(n\) as suggested in Steiger, 1980, and Olkin & Finn, 1990). This is more consistent with the usual equation for computing the sampling variance of a correlation coefficient (which also typically uses \(n-1\) in the denominator).

For raw and r-to-z transformed coefficients, the variance-covariance matrix will only be computed when the (common) sample size for a study is at least 5.

Author

Wolfgang Viechtbauer (wvb@metafor-project.org, https://www.metafor-project.org).

References

Fisher, R. A. (1921). On the “probable error” of a coefficient of correlation deduced from a small sample. Metron, 1, 1–32. http://hdl.handle.net/2440/15169

Olkin, I., & Finn, J. D. (1990). Testing correlated correlations. Psychological Bulletin, 108(2), 330–333. https://doi.org/10.1037/0033-2909.108.2.330

Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245–251. https://doi.org/10.1037/0033-2909.87.2.245

Examples

############################################################################

### copy data into 'dat' and examine the first 12 rows
dat <- dat.craft2003
head(dat, 12)
#>    study  ni sport    ri var1 var2
#> 1      1 142     I -0.55 acog perf
#> 2      1 142     I -0.48 asom perf
#> 3      1 142     I  0.66 conf perf
#> 4      1 142     I  0.47 acog asom
#> 5      1 142     I -0.38 acog conf
#> 6      1 142     I -0.46 asom conf
#> 7      3  37     I  0.53 acog perf
#> 8      3  37     I -0.12 asom perf
#> 9      3  37     I  0.03 conf perf
#> 10     3  37     I  0.52 acog asom
#> 11     3  37     I -0.48 acog conf
#> 12     3  37     I -0.40 asom conf

### construct dataset and var-cov matrix of the correlations
tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat)
V <- tmp$V
dat <- tmp$dat

### examine data and var-cov matrix for study 1
dat[dat$study == 1,]
#>   study var1 var2 var1.var2    yi  ni
#> 1     1 acog perf acog.perf -0.55 142
#> 2     1 asom perf asom.perf -0.48 142
#> 3     1 conf perf conf.perf  0.66 142
#> 4     1 acog asom acog.asom  0.47 142
#> 5     1 acog conf acog.conf -0.38 142
#> 6     1 asom conf asom.conf -0.46 142
blsplit(V, dat$study, round, 4)$`1`
#>           acog.perf asom.perf conf.perf acog.asom acog.conf asom.conf
#> acog.perf    0.0035    0.0013   -0.0006   -0.0014    0.0025    0.0009
#> asom.perf    0.0013    0.0042   -0.0010   -0.0019    0.0013    0.0025
#> conf.perf   -0.0006   -0.0010    0.0023    0.0006   -0.0015   -0.0011
#> acog.asom   -0.0014   -0.0019    0.0006    0.0043   -0.0018   -0.0012
#> acog.conf    0.0025    0.0013   -0.0015   -0.0018    0.0052    0.0019
#> asom.conf    0.0009    0.0025   -0.0011   -0.0012    0.0019    0.0044

### examine data and var-cov matrix for study 6
dat[dat$study == 6,]
#>    study var1 var2 var1.var2   yi ni
#> 13     6 acog perf acog.perf 0.44 16
#> 14     6 asom perf asom.perf 0.46 16
#> 15     6 conf perf conf.perf   NA 16
#> 16     6 acog asom acog.asom 0.67 16
#> 17     6 acog conf acog.conf   NA 16
#> 18     6 asom conf asom.conf   NA 16
blsplit(V, dat$study, round, 4)$`6`
#>           acog.perf asom.perf conf.perf acog.asom acog.conf asom.conf
#> acog.perf    0.0434    0.0256        NA    0.0095        NA        NA
#> asom.perf    0.0256    0.0414        NA    0.0085        NA        NA
#> conf.perf        NA        NA        NA        NA        NA        NA
#> acog.asom    0.0095    0.0085        NA    0.0202        NA        NA
#> acog.conf        NA        NA        NA        NA        NA        NA
#> asom.conf        NA        NA        NA        NA        NA        NA

### examine data and var-cov matrix for study 17
dat[dat$study == 17,]
#>    study var1 var2 var1.var2    yi ni
#> 25    17 acog perf acog.perf  0.10 45
#> 26    17 asom perf asom.perf  0.31 45
#> 27    17 conf perf conf.perf -0.17 45
#> 28    17 acog asom acog.asom    NA 45
#> 29    17 acog conf acog.conf    NA 45
#> 30    17 asom conf asom.conf    NA 45
blsplit(V, dat$study, round, 4)$`17`
#>           acog.perf asom.perf conf.perf acog.asom acog.conf asom.conf
#> acog.perf    0.0223        NA        NA        NA        NA        NA
#> asom.perf        NA    0.0186        NA        NA        NA        NA
#> conf.perf        NA        NA    0.0214        NA        NA        NA
#> acog.asom        NA        NA        NA        NA        NA        NA
#> acog.conf        NA        NA        NA        NA        NA        NA
#> asom.conf        NA        NA        NA        NA        NA        NA

############################################################################

### copy data into 'dat' and examine the first 12 rows
dat <- dat.craft2003
head(dat, 12)
#>    study  ni sport    ri var1 var2
#> 1      1 142     I -0.55 acog perf
#> 2      1 142     I -0.48 asom perf
#> 3      1 142     I  0.66 conf perf
#> 4      1 142     I  0.47 acog asom
#> 5      1 142     I -0.38 acog conf
#> 6      1 142     I -0.46 asom conf
#> 7      3  37     I  0.53 acog perf
#> 8      3  37     I -0.12 asom perf
#> 9      3  37     I  0.03 conf perf
#> 10     3  37     I  0.52 acog asom
#> 11     3  37     I -0.48 acog conf
#> 12     3  37     I -0.40 asom conf

### restructure data from study 1 into a correlation matrix
R1 <- diag(4)
R1[lower.tri(R1)] <- dat$ri[dat$study == 1]
R1[upper.tri(R1)] <- t(R1)[upper.tri(R1)]
rownames(R1) <- colnames(R1) <- c("perf", "acog", "asom", "conf")
R1
#>       perf  acog  asom  conf
#> perf  1.00 -0.55 -0.48  0.66
#> acog -0.55  1.00  0.47 -0.38
#> asom -0.48  0.47  1.00 -0.46
#> conf  0.66 -0.38 -0.46  1.00

### restructure data from study 3 into a correlation matrix
R3 <- diag(4)
R3[lower.tri(R3)] <- dat$ri[dat$study == 3]
R3[upper.tri(R3)] <- t(R3)[upper.tri(R3)]
rownames(R3) <- colnames(R3) <- c("perf", "acog", "asom", "conf")
R3
#>       perf  acog  asom  conf
#> perf  1.00  0.53 -0.12  0.03
#> acog  0.53  1.00  0.52 -0.48
#> asom -0.12  0.52  1.00 -0.40
#> conf  0.03 -0.48 -0.40  1.00

### an example where a correlation matrix is passed to rcalc()
rcalc(R1, ni=142)
#> $dat
#>   var1 var2 var1.var2    yi  ni
#> 1 acog perf acog.perf -0.55 142
#> 2 asom perf asom.perf -0.48 142
#> 3 conf perf conf.perf  0.66 142
#> 4 acog asom acog.asom  0.47 142
#> 5 acog conf acog.conf -0.38 142
#> 6 asom conf asom.conf -0.46 142
#> 
#> $V
#>               acog.perf     asom.perf     conf.perf     acog.asom    acog.conf     asom.conf
#> acog.perf  0.0034503989  0.0013265149 -0.0005545798 -0.0013967848  0.002501895  0.0009322372
#> asom.perf  0.0013265149  0.0042005969 -0.0009521407 -0.0019433591  0.001264856  0.0025160783
#> conf.perf -0.0005545798 -0.0009521407  0.0022592011  0.0005791091 -0.001533798 -0.0010692460
#> acog.asom -0.0013967848 -0.0019433591  0.0005791091  0.0043049419 -0.001802689 -0.0012050560
#> acog.conf  0.0025018954  0.0012648562 -0.0015337979 -0.0018026891  0.005191854  0.0018844047
#> asom.conf  0.0009322372  0.0025160783 -0.0010692460 -0.0012050560  0.001884405  0.0044083302
#> 

### an example where a list of correlation matrices is passed to rcalc()
tmp <- rcalc(list("1"=R1,"3"=R3), ni=c(142,37))
V <- tmp$V
dat <- tmp$dat

### examine data and var-cov matrix for study 1
dat[dat$id == 1,]
#>   id var1 var2 var1.var2    yi  ni
#> 1  1 acog perf acog.perf -0.55 142
#> 2  1 asom perf asom.perf -0.48 142
#> 3  1 conf perf conf.perf  0.66 142
#> 4  1 acog asom acog.asom  0.47 142
#> 5  1 acog conf acog.conf -0.38 142
#> 6  1 asom conf asom.conf -0.46 142
blsplit(V, dat$id, round, 4)$`1`
#>           acog.perf asom.perf conf.perf acog.asom acog.conf asom.conf
#> acog.perf    0.0035    0.0013   -0.0006   -0.0014    0.0025    0.0009
#> asom.perf    0.0013    0.0042   -0.0010   -0.0019    0.0013    0.0025
#> conf.perf   -0.0006   -0.0010    0.0023    0.0006   -0.0015   -0.0011
#> acog.asom   -0.0014   -0.0019    0.0006    0.0043   -0.0018   -0.0012
#> acog.conf    0.0025    0.0013   -0.0015   -0.0018    0.0052    0.0019
#> asom.conf    0.0009    0.0025   -0.0011   -0.0012    0.0019    0.0044

### examine data and var-cov matrix for study 3
dat[dat$id == 3,]
#>    id var1 var2 var1.var2    yi ni
#> 7   3 acog perf acog.perf  0.53 37
#> 8   3 asom perf asom.perf -0.12 37
#> 9   3 conf perf conf.perf  0.03 37
#> 10  3 acog asom acog.asom  0.52 37
#> 11  3 acog conf acog.conf -0.48 37
#> 12  3 asom conf asom.conf -0.40 37
blsplit(V, dat$id, round, 4)$`3`
#>           acog.perf asom.perf conf.perf acog.asom acog.conf asom.conf
#> acog.perf    0.0144    0.0106   -0.0097   -0.0032    0.0021    0.0034
#> asom.perf    0.0106    0.0270   -0.0109    0.0109   -0.0020    0.0001
#> conf.perf   -0.0097   -0.0109    0.0277   -0.0013    0.0114   -0.0027
#> acog.asom   -0.0032    0.0109   -0.0013    0.0148   -0.0044   -0.0066
#> acog.conf    0.0021   -0.0020    0.0114   -0.0044    0.0165    0.0079
#> asom.conf    0.0034    0.0001   -0.0027   -0.0066    0.0079    0.0196

############################################################################