Function to fit regression models based on correlation and covariance matrices.

matreg(y, x, R, n, V, cov=FALSE, means, ztor=FALSE,
nearpd=FALSE, level=95, digits, ...)

## Arguments

y

index (or name given as a character string) of the outcome variable.

x

indices (or names given as a character vector) of the predictor variables.

R

correlation or covariance matrix (or only the lower triangular part including the diagonal).

n

sample size based on which the elements in the correlation/covariance matrix were computed.

V

variance-covariance matrix of the lower triangular elements of the correlation/covariance matrix. Either V or n should be specified, not both. See ‘Details’.

cov

logical to specify whether R is a covariance matrix (the default is FALSE).

means

optional vector to specify the means of the variables (only relevant when cov=TRUE).

ztor

logical to specify whether R is a matrix of r-to-z transformed correlations and hence should be back-transformed to raw correlations (the default is FALSE). See ‘Details’.

nearpd

logical to specify whether the nearPD function from the Matrix package should be used when the $$R_{x,x}$$ matrix cannot be inverted. See ‘Note’.

level

numeric value between 0 and 100 to specify the confidence interval level (the default is 95).

digits

optional integer to specify the number of decimal places to which the printed results should be rounded.

...

other arguments.

## Details

Let $$R$$ be a $$p \times p$$ correlation or covariance matrix. Let $$y$$ denote the row/column of the outcome variable and $$x$$ the row(s)/column(s) of the predictor variable(s) in this matrix. Let $$R_{x,x}$$ and $$R_{x,y}$$ denote the corresponding submatrices of $$R$$. Then $b = R_{x,x}^{-1} R_{x,y}$ yields the standardized or raw regression coefficients (depending on whether $$R$$ is a correlation or covariance matrix, respectively) when regressing the outcome variable on the predictor variable(s).

The $$R$$ matrix may be computed based on a single sample of $$n$$ subjects. In this case, one should specify the sample size via argument n. The variance-covariance matrix of the standardized regression coefficients is then given by $$\mbox{Var}[b] = \mbox{MSE} \times R_{x,x}^{-1}$$, where $$\mbox{MSE} = (1 - b'R_{x,y}) / (n - m)$$ and $$m$$ denotes the number of predictor variables. The standard errors are then given by the square root of the diagonal elements of $$\mbox{Var}[b]$$. Test statistics (in this case, t-statistics) and the corresponding p-values can then be computed as in a regular regression analysis. When $$R$$ is a covariance matrix, one should set cov=TRUE and specify the means of the $$p$$ variables via argument means to obtain raw regression coefficients including the intercept and corresponding standard errors.

Alternatively, $$R$$ may be the result of a meta-analysis of correlation coefficients. In this case, the elements in $$R$$ are pooled correlation coefficients and the variance-covariance matrix of these pooled coefficients should be specified via argument V. The order of elements in V should correspond to the order of elements in the lower triangular part of $$R$$ column-wise. For example, if $$R$$ is a $$4 \times 4$$ matrix of the form: $\begin{bmatrix} 1 & & & \\ r_{21} & 1 & & \\ r_{31} & r_{32} & 1 & \\ r_{41} & r_{42} & r_{43} & 1 \end{bmatrix}$ then the elements are $$r_{21}$$, $$r_{31}$$, $$r_{41}$$, $$r_{32}$$, $$r_{42}$$, and $$r_{43}$$ and hence V should be a $$6 \times 6$$ variance-covariance matrix of these elements in this order. The variance-covariance matrix of the standardized regression coefficients (i.e., $$\mbox{Var}[b]$$) is then computed as a function of V as described in Becker (1992) using the multivariate delta method. The standard errors are then again given by the square root of the diagonal elements of $$\mbox{Var}[b]$$. Test statistics (in this case, z-statistics) and the corresponding p-values can then be computed in the usual manner.

In case $$R$$ is the result of a meta-analysis of Fisher r-to-z transformed correlation coefficients (and hence V is then the corresponding variance-covariance matrix of these pooled transformed coefficients), one should set argument ztor=TRUE, so that the appropriate back-transformation is then applied to R (and V) within the function.

Finally, $$R$$ may be a covariance matrix based on a meta-analysis (e.g., the estimated variance-covariance matrix of the random effects in a multivariate model). In this case, one should set cov=TRUE and V should again be the variance-covariance matrix of the elements in $$R$$, but now including the diagonal. Hence, if $$R$$ is a $$4 \times 4$$ matrix of the form: $\begin{bmatrix} \tau_1^2 & & & \\ \tau_{21} & \tau_2^2 & & \\ \tau_{31} & \tau_{32} & \tau_3^2 & \\ \tau_{41} & \tau_{42} & \tau_{43} & \tau_4^2 \end{bmatrix}$ then the elements are $$\tau^2_1$$, $$\tau_{21}$$, $$\tau_{31}$$, $$\tau_{41}$$, $$\tau^2_2$$, $$\tau_{32}$$, $$\tau_{42}$$, $$\tau^2_3$$, $$\tau_{43}$$, and $$\tau^2_4$$, and hence V should be a $$10 \times 10$$ variance-covariance matrix of these elements in this order. Argument means can then again be used to specify the means of the variables.

## Value

An object of class "matreg". The object is a list containing the following components:

tab

a data frame with the estimated model coefficients, standard errors, test statistics, degrees of freedom (only for t-tests), p-values, and lower/upper confidence interval bounds.

vb

the variance-covariance matrix of the estimated model coefficients.

...

The results are formatted and printed with the print function.

## Note

Only the lower triangular part of R (and V if it is specified) is used in the computations.

If $$R_{x,x}$$ is not invertible, an error will be issued. In this case, one can set argument nearpd=TRUE, in which case the nearPD function from the Matrix package will be used to find the nearest positive semi-definite matrix, which should be invertible. The results should be treated with caution when this is done.

When $$R$$ is a covariance matrix with V and means specified, the means are treated as known constants when estimating the standard error of the intercept.

## Author

Wolfgang Viechtbauer wvb@metafor-project.org https://www.metafor-project.org

## References

Becker, B. J. (1992). Using results from replicated studies to estimate linear models. Journal of Educational Statistics, 17(4), 341–362. https://doi.org/10.3102/10769986017004341

Becker, B. J. (1995). Corrections to "Using results from replicated studies to estimate linear models". Journal of Educational and Behavioral Statistics, 20(1), 100–102. https://doi.org/10.3102/10769986020001100

Becker, B. J., & Aloe, A. (2019). Model-based meta-analysis and related approaches. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (3rd ed., pp. 339–363). New York: Russell Sage Foundation.

rma.mv for a function to meta-analyze multiple correlation coefficients that can be used to construct an $$R$$ matrix.

rcalc for a function to construct the variance-covariance matrix of dependent correlation coefficients.

## Examples

### copy data into 'dat'
dat <- dat.craft2003

### construct dataset and var-cov matrix of the correlations
tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat)
V <- tmp$V dat <- tmp$dat

### turn var1.var2 into a factor with the desired order of levels
dat$var1.var2 <- factor(dat$var1.var2,
levels=c("acog.perf", "asom.perf", "conf.perf", "acog.asom", "acog.conf", "asom.conf"))

### multivariate random-effects model
res <- rma.mv(yi, V, mods = ~ var1.var2 - 1, random = ~ var1.var2 | study, struct="UN", data=dat)
#> Warning: Rows with NAs omitted from model fitting.
res
#>
#> Multivariate Meta-Analysis Model (k = 51; method: REML)
#>
#> Variance Components:
#>
#> outer factor: study     (nlvls = 9)
#> inner factor: var1.var2 (nlvls = 6)
#>
#>             estim    sqrt  k.lvl  fixed      level
#> tau^2.1    0.1611  0.4014      9     no  acog.perf
#> tau^2.2    0.0604  0.2459      9     no  asom.perf
#> tau^2.3    0.0468  0.2163      8     no  conf.perf
#> tau^2.4    0.0047  0.0683      9     no  acog.asom
#> tau^2.5    0.0125  0.1119      8     no  acog.conf
#> tau^2.6    0.0111  0.1052      8     no  asom.conf
#>
#>            rho.acg.p  rho.asm.p  rho.cnf.  rho.acg.s  rho.acg.c  rho.asm.c    acg.p  asm.p  cnf.
#> acog.perf          1                                                              -      9     8
#> asom.perf     0.9497          1                                                  no      -     8
#> conf.perf    -0.6178    -0.5969         1                                        no     no     -
#> acog.asom     0.5491     0.4604   -0.9345          1                             no     no    no
#> acog.conf     0.0432    -0.0495    0.7023    -0.6961          1                  no     no    no
#> asom.conf     0.3532     0.2688   -0.1311    -0.0891     0.4193          1       no     no    no
#>            acg.s  acg.c  asm.c
#> acog.perf      9      8      8
#> asom.perf      9      8      8
#> conf.perf      8      8      8
#> acog.asom      -      8      8
#> acog.conf     no      -      8
#> asom.conf     no     no      -
#>
#> Test for Residual Heterogeneity:
#> QE(df = 45) = 334.8358, p-val < .0001
#>
#> Test of Moderators (coefficients 1:6):
#> QM(df = 6) = 596.7711, p-val < .0001
#>
#> Model Results:
#>
#>                     estimate      se     zval    pval    ci.lb    ci.ub
#> var1.var2acog.perf   -0.0600  0.1408  -0.4264  0.6698  -0.3359   0.2159
#> var1.var2asom.perf   -0.1423  0.0917  -1.5527  0.1205  -0.3220   0.0373
#> var1.var2conf.perf    0.3167  0.0847   3.7393  0.0002   0.1507   0.4827  ***
#> var1.var2acog.asom    0.5671  0.0367  15.4640  <.0001   0.4953   0.6390  ***
#> var1.var2acog.conf   -0.4888  0.0509  -9.6048  <.0001  -0.5886  -0.3891  ***
#> var1.var2asom.conf   -0.4750  0.0506  -9.3901  <.0001  -0.5741  -0.3758  ***
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>

### restructure estimated mean correlations into a 4x4 matrix
R <- vec2mat(coef(res))
rownames(R) <- colnames(R) <- c("perf", "acog", "asom", "conf")
round(R, digits=3)
#>        perf   acog   asom   conf
#> perf  1.000 -0.060 -0.142  0.317
#> acog -0.060  1.000  0.567 -0.489
#> asom -0.142  0.567  1.000 -0.475
#> conf  0.317 -0.489 -0.475  1.000

### check that order in vcov(res) corresponds to order in R
round(vcov(res), digits=4)
#>                    var1.var2acog.perf var1.var2asom.perf var1.var2conf.perf var1.var2acog.asom
#> var1.var2acog.perf             0.0198             0.0115            -0.0069             0.0017
#> var1.var2asom.perf             0.0115             0.0084            -0.0043             0.0009
#> var1.var2conf.perf            -0.0069            -0.0043             0.0072            -0.0017
#> var1.var2acog.asom             0.0017             0.0009            -0.0017             0.0013
#> var1.var2acog.conf             0.0004            -0.0002             0.0023            -0.0009
#> var1.var2asom.conf             0.0018             0.0010            -0.0004            -0.0004
#>                    var1.var2acog.conf var1.var2asom.conf
#> var1.var2acog.perf             0.0004             0.0018
#> var1.var2asom.perf            -0.0002             0.0010
#> var1.var2conf.perf             0.0023            -0.0004
#> var1.var2acog.asom            -0.0009            -0.0004
#> var1.var2acog.conf             0.0026             0.0011
#> var1.var2asom.conf             0.0011             0.0026

### fit regression model with 'perf' as outcome and 'acog', 'asom', and 'conf' as predictors
matreg(1, 2:4, R=R, V=vcov(res))
#>
#>       estimate      se     zval    pval    ci.lb   ci.ub
#> acog    0.1482  0.1566   0.9465  0.3439  -0.1587  0.4550
#> asom   -0.0536  0.0768  -0.6979  0.4852  -0.2043  0.0970
#> conf    0.3637  0.0910   3.9985  <.0001   0.1854  0.5419  ***
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>

### can also specify variable names
matreg("perf", c("acog","asom","conf"), R=R, V=vcov(res))
#>
#>       estimate      se     zval    pval    ci.lb   ci.ub
#> acog    0.1482  0.1566   0.9465  0.3439  -0.1587  0.4550
#> asom   -0.0536  0.0768  -0.6979  0.4852  -0.2043  0.0970
#> conf    0.3637  0.0910   3.9985  <.0001   0.1854  0.5419  ***
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>

# \dontrun{
### repeat the above but with r-to-z transformed correlations
dat <- dat.craft2003
tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat, rtoz=TRUE)
V <- tmp$V dat <- tmp$dat
dat$var1.var2 <- factor(dat$var1.var2,
levels=c("acog.perf", "asom.perf", "conf.perf", "acog.asom", "acog.conf", "asom.conf"))
res <- rma.mv(yi, V, mods = ~ var1.var2 - 1, random = ~ var1.var2 | study, struct="UN", data=dat)
#> Warning: Rows with NAs omitted from model fitting.
R <- vec2mat(coef(res))
rownames(R) <- colnames(R) <- c("perf", "acog", "asom", "conf")
matreg(1, 2:4, R=R, V=vcov(res), ztor=TRUE)
#>
#>       estimate      se     zval    pval    ci.lb   ci.ub
#> acog    0.1362  0.1697   0.8023  0.4224  -0.1965  0.4688
#> asom   -0.0678  0.0761  -0.8900  0.3735  -0.2170  0.0814
#> conf    0.3666  0.0934   3.9248  <.0001   0.1835  0.5496  ***
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
# }

############################################################################

### a different example based on van Houwelingen et al. (2002)

### create dataset in long format
dat.long <- to.long(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.colditz1994)
dat.long <- escalc(measure="PLO", xi=out1, mi=out2, data=dat.long)
dat.long$tpos <- dat.long$tneg <- dat.long$cpos <- dat.long$cneg <- NULL
levels(dat.long$group) <- c("CON", "EXP") ### fit bivariate model res <- rma.mv(yi, vi, mods = ~ group - 1, random = ~ group | trial, struct="UN", data=dat.long, method="ML") res #> #> Multivariate Meta-Analysis Model (k = 26; method: ML) #> #> Variance Components: #> #> outer factor: trial (nlvls = 13) #> inner factor: group (nlvls = 2) #> #> estim sqrt k.lvl fixed level #> tau^2.1 2.4073 1.5516 13 no CON #> tau^2.2 1.4314 1.1964 13 no EXP #> #> rho.CON rho.EXP CON EXP #> CON 1 - 13 #> EXP 0.9467 1 no - #> #> Test for Residual Heterogeneity: #> QE(df = 24) = 5270.3863, p-val < .0001 #> #> Test of Moderators (coefficients 1:2): #> QM(df = 2) = 292.4633, p-val < .0001 #> #> Model Results: #> #> estimate se zval pval ci.lb ci.ub #> groupCON -4.0960 0.4347 -9.4226 <.0001 -4.9480 -3.2440 *** #> groupEXP -4.8337 0.3396 -14.2329 <.0001 -5.4994 -4.1681 *** #> #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> ### regression of log(odds)_EXP on log(odds)_CON matreg(y=2, x=1, R=res$G, cov=TRUE, means=coef(res), n=res$g.levels.comb.k) #> #> estimate se tval df pval ci.lb ci.ub #> intrcpt -1.8437 0.3265 -5.6477 11 0.0001 -2.5623 -1.1252 *** #> CON 0.7300 0.0749 9.7467 11 <.0001 0.5651 0.8948 *** #> #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> ### but the SE of the CON coefficient is not computed correctly, since above we treat res$G as if
### it was a var-cov matrix computed from raw data based on res$g.levels.comb.k (= 13) data points ### fit bivariate model and get the var-cov matrix of the estimates in res$G
res <- rma.mv(yi, vi, mods = ~ group - 1, random = ~ group | trial, struct="UN",
data=dat.long, method="ML", cvvc="varcov", control=list(nearpd=TRUE))

### now use res$vvc as the var-cov matrix of the estimates in res$G
matreg(y=2, x=1, R=res$G, cov=TRUE, means=coef(res), V=res$vvc)
#>
#>          estimate      se     zval    pval    ci.lb    ci.ub
#> intrcpt   -1.8437  0.3548  -5.1967  <.0001  -2.5391  -1.1484  ***
#> CON        0.7300  0.0866   8.4276  <.0001   0.5602   0.8998  ***
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>