matreg.Rd
Function to fit regression models based on a correlation and covariance matrix.
matreg(y, x, R, n, V, cov=FALSE, means, ztor=FALSE,
nearpd=FALSE, level=95, digits, ...)
index (or name given as a character string) of the outcome variable.
indices (or names given as a character vector) of the predictor variables.
correlation or covariance matrix (or only the lower triangular part including the diagonal).
sample size based on which the elements in the correlation/covariance matrix were computed.
variance-covariance matrix of the lower triangular elements of the correlation/covariance matrix. Either V
or n
should be specified, not both. See ‘Details’.
logical to specify whether R
is a covariance matrix (the default is FALSE
).
optional vector to specify the means of the variables (only relevant when cov=TRUE
).
logical to specify whether R
is a matrix of r-to-z transformed correlations and hence should be back-transformed to raw correlations (the default is FALSE
). See ‘Details’.
logical to specify whether the nearPD
function from the Matrix package should be used when the \(R_{x,x}\) matrix cannot be inverted. See ‘Note’.
numeric value between 0 and 100 to specify the confidence interval level (the default is 95; see here for details).
optional integer to specify the number of decimal places to which the printed results should be rounded.
other arguments.
Let \(R\) be a \(p \times p\) correlation or covariance matrix. Let \(y\) denote the row/column of the outcome variable and \(x\) the row(s)/column(s) of the predictor variable(s) in this matrix. Let \(R_{x,x}\) and \(R_{x,y}\) denote the corresponding submatrices of \(R\). Then \[b = R_{x,x}^{-1} R_{x,y}\] yields the standardized or raw regression coefficients (depending on whether \(R\) is a correlation or covariance matrix, respectively) when regressing the outcome variable on the predictor variable(s).
The \(R\) matrix may be computed based on a single sample of \(n\) subjects. In this case, one should specify the sample size via argument n
. The variance-covariance matrix of the standardized regression coefficients is then given by \(\text{Var}[b] = \text{MSE} \times R_{x,x}^{-1}\), where \(\text{MSE} = (1 - b'R_{x,y}) / (n - m)\) and \(m\) denotes the number of predictor variables. The standard errors are then given by the square root of the diagonal elements of \(\text{Var}[b]\). Test statistics (in this case, t-statistics) and the corresponding p-values can then be computed as in a regular regression analysis. When \(R\) is a covariance matrix, one should set cov=TRUE
and specify the means of the \(p\) variables via argument means
to obtain raw regression coefficients including the intercept and corresponding standard errors.
Alternatively, \(R\) may be the result of a meta-analysis of correlation coefficients. In this case, the elements in \(R\) are pooled correlation coefficients and the variance-covariance matrix of these pooled coefficients should be specified via argument V
. The order of elements in V
should correspond to the order of elements in the lower triangular part of \(R\) column-wise. For example, if \(R\) is a \(4 \times 4\) matrix of the form: \[\begin{bmatrix} 1 & & & \\ r_{21} & 1 & & \\ r_{31} & r_{32} & 1 & \\ r_{41} & r_{42} & r_{43} & 1 \end{bmatrix}\] then the elements are \(r_{21}\), \(r_{31}\), \(r_{41}\), \(r_{32}\), \(r_{42}\), and \(r_{43}\) and hence V
should be a \(6 \times 6\) variance-covariance matrix of these elements in this order. The variance-covariance matrix of the standardized regression coefficients (i.e., \(\text{Var}[b]\)) is then computed as a function of V
as described in Becker (1992) using the multivariate delta method. The standard errors are then again given by the square root of the diagonal elements of \(\text{Var}[b]\). Test statistics (in this case, z-statistics) and the corresponding p-values can then be computed in the usual manner.
In case \(R\) is the result of a meta-analysis of Fisher r-to-z transformed correlation coefficients (and hence V
is then the corresponding variance-covariance matrix of these pooled transformed coefficients), one should set argument ztor=TRUE
, so that the appropriate back-transformation is then applied to R
(and V
) within the function.
Finally, \(R\) may be a covariance matrix based on a meta-analysis (e.g., the estimated variance-covariance matrix of the random effects in a multivariate model). In this case, one should set cov=TRUE
and V
should again be the variance-covariance matrix of the elements in \(R\), but now including the diagonal. Hence, if \(R\) is a \(4 \times 4\) matrix of the form: \[\begin{bmatrix} \tau_1^2 & & & \\ \tau_{21} & \tau_2^2 & & \\ \tau_{31} & \tau_{32} & \tau_3^2 & \\ \tau_{41} & \tau_{42} & \tau_{43} & \tau_4^2 \end{bmatrix}\] then the elements are \(\tau^2_1\), \(\tau_{21}\), \(\tau_{31}\), \(\tau_{41}\), \(\tau^2_2\), \(\tau_{32}\), \(\tau_{42}\), \(\tau^2_3\), \(\tau_{43}\), and \(\tau^2_4\), and hence V
should be a \(10 \times 10\) variance-covariance matrix of these elements in this order. Argument means
can then again be used to specify the means of the variables.
An object of class "matreg"
. The object is a list containing the following components:
a data frame with the estimated model coefficients, standard errors, test statistics, degrees of freedom (only for t-tests), p-values, and lower/upper confidence interval bounds.
the variance-covariance matrix of the estimated model coefficients.
some additional elements/values.
The results are formatted and printed with the print
function. Extractor functions include coef
and vcov
.
Only the lower triangular part of R
(and V
if it is specified) is used in the computations.
If \(R_{x,x}\) is not invertible, an error will be issued. In this case, one can set argument nearpd=TRUE
, in which case the nearPD
function from the Matrix package will be used to find the nearest positive semi-definite matrix, which should be invertible. The results should be treated with caution when this is done.
When \(R\) is a covariance matrix with V
and means
specified, the means are treated as known constants when estimating the standard error of the intercept.
Becker, B. J. (1992). Using results from replicated studies to estimate linear models. Journal of Educational Statistics, 17(4), 341–362. https://doi.org/10.3102/10769986017004341
Becker, B. J. (1995). Corrections to "Using results from replicated studies to estimate linear models". Journal of Educational and Behavioral Statistics, 20(1), 100–102. https://doi.org/10.3102/10769986020001100
Becker, B. J., & Aloe, A. (2019). Model-based meta-analysis and related approaches. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (3rd ed., pp. 339–363). New York: Russell Sage Foundation.
############################################################################
### first an example unrelated to meta-analysis, simply demonstrating that
### one can obtain the same results from lm() and matreg()
### fit a regression model with lm() to the 'mtcars' dataset
res <- lm(mpg ~ hp + wt + am, data=mtcars)
summary(res)
#>
#> Call:
#> lm(formula = mpg ~ hp + wt + am, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.4221 -1.7924 -0.3788 1.2249 5.5317
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
#> hp -0.037479 0.009605 -3.902 0.000546 ***
#> wt -2.878575 0.904971 -3.181 0.003574 **
#> am 2.083710 1.376420 1.514 0.141268
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 2.538 on 28 degrees of freedom
#> Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
#> F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
#>
### covariance matrix of the dataset
S <- cov(mtcars)
### fit the same regression model using matreg()
res <- matreg(y="mpg", x=c("hp","wt","am"), R=S, cov=TRUE,
means=colMeans(mtcars), n=nrow(mtcars))
summary(res)
#>
#> estimate se tval df pval ci.lb ci.ub
#> intrcpt 34.0029 2.6427 12.8669 28 <.0001 28.5896 39.4161 ***
#> hp -0.0375 0.0096 -3.9018 28 0.0005 -0.0572 -0.0178 ***
#> wt -2.8786 0.9050 -3.1808 28 0.0036 -4.7323 -1.0248 **
#> am 2.0837 1.3764 1.5139 28 0.1413 -0.7358 4.9032
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 2.5375 on 28 degrees of freedom
#> Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
#> F-statistic: 48.9600 on 3 and 28 DF, p-value: <.0001
#>
### copy the 'mtcars' dataset to 'dat' and standardize all variables
dat <- mtcars
dat[] <- scale(dat)
### fit a regression model with lm() to obtain standardized regression coefficients ('betas')
res <- lm(mpg ~ 0 + hp + wt + am, data=dat)
summary(res)
#>
#> Call:
#> lm(formula = mpg ~ 0 + hp + wt + am, data = dat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.56779 -0.29739 -0.06285 0.20324 0.91783
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> hp -0.4264 0.1074 -3.971 0.000433 ***
#> wt -0.4673 0.1444 -3.237 0.003017 **
#> am 0.1725 0.1120 1.541 0.134242
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.4137 on 29 degrees of freedom
#> Multiple R-squared: 0.8399, Adjusted R-squared: 0.8233
#> F-statistic: 50.71 on 3 and 29 DF, p-value: 1.183e-11
#>
### correlation matrix of the dataset
R <- cor(mtcars)
### fit the same regression model using matreg()
res <- matreg(y="mpg", x=c("hp","wt","am"), R=R, n=nrow(mtcars))
summary(res)
#>
#> estimate se tval df pval ci.lb ci.ub
#> hp -0.4264 0.1074 -3.9709 29 0.0004 -0.6460 -0.2068 ***
#> wt -0.4673 0.1444 -3.2372 29 0.0030 -0.7626 -0.1721 **
#> am 0.1725 0.1120 1.5407 29 0.1342 -0.0565 0.4015
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.4137 on 29 degrees of freedom
#> Multiple R-squared: 0.8399, Adjusted R-squared: 0.8233
#> F-statistic: 50.7086 on 3 and 29 DF, p-value: <.0001
#>
### note: the standard errors of the betas should not be used to construct CIs
### as they assume that the null hypothesis (H0: beta_j = 0) is true
### construct the var-cov matrix of correlations in R
V <- rcalc(R, ni=nrow(mtcars))$V
### fit the same regression model using matreg() but now supply V
res <- matreg(y="mpg", x=c("hp","wt","am"), R=R, V=V)
summary(res)
#>
#> estimate se zval pval ci.lb ci.ub
#> hp -0.4264 0.1060 -4.0218 <.0001 -0.6341 -0.2186 ***
#> wt -0.4673 0.1378 -3.3920 0.0007 -0.7374 -0.1973 ***
#> am 0.1725 0.1097 1.5724 0.1158 -0.0425 0.3876
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> R^2: 0.8399, QM(df = 3) = 758.2210, p-val < .0001
#>
### the standard errors computed in this way can now be used to construct
### CIs for the betas (here, the difference is relatively small)
############################################################################
### copy data into 'dat'
dat <- dat.craft2003
### construct dataset and var-cov matrix of the correlations
tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat)
V <- tmp$V
dat <- tmp$dat
### turn var1.var2 into a factor with the desired order of levels
dat$var1.var2 <- factor(dat$var1.var2,
levels=c("acog.perf", "asom.perf", "conf.perf", "acog.asom", "acog.conf", "asom.conf"))
### multivariate random-effects model
res <- rma.mv(yi, V, mods = ~ 0 + var1.var2, random = ~ var1.var2 | study, struct="UN", data=dat)
#> Warning: 9 rows with NAs omitted from model fitting.
res
#>
#> Multivariate Meta-Analysis Model (k = 51; method: REML)
#>
#> Variance Components:
#>
#> outer factor: study (nlvls = 9)
#> inner factor: var1.var2 (nlvls = 6)
#>
#> estim sqrt k.lvl fixed level
#> tau^2.1 0.1611 0.4014 9 no acog.perf
#> tau^2.2 0.0604 0.2459 9 no asom.perf
#> tau^2.3 0.0468 0.2163 8 no conf.perf
#> tau^2.4 0.0047 0.0683 9 no acog.asom
#> tau^2.5 0.0125 0.1119 8 no acog.conf
#> tau^2.6 0.0111 0.1052 8 no asom.conf
#>
#> rho.acg.p rho.asm.p rho.cnf. rho.acg.s rho.acg.c rho.asm.c acg.p asm.p cnf.
#> acog.perf 1 - 9 8
#> asom.perf 0.9497 1 no - 8
#> conf.perf -0.6178 -0.5969 1 no no -
#> acog.asom 0.5491 0.4604 -0.9345 1 no no no
#> acog.conf 0.0432 -0.0495 0.7023 -0.6961 1 no no no
#> asom.conf 0.3532 0.2688 -0.1311 -0.0891 0.4193 1 no no no
#> acg.s acg.c asm.c
#> acog.perf 9 8 8
#> asom.perf 9 8 8
#> conf.perf 8 8 8
#> acog.asom - 8 8
#> acog.conf no - 8
#> asom.conf no no -
#>
#> Test for Residual Heterogeneity:
#> QE(df = 45) = 334.8358, p-val < .0001
#>
#> Test of Moderators (coefficients 1:6):
#> QM(df = 6) = 596.7705, p-val < .0001
#>
#> Model Results:
#>
#> estimate se zval pval ci.lb ci.ub
#> var1.var2acog.perf -0.0600 0.1408 -0.4264 0.6698 -0.3359 0.2159
#> var1.var2asom.perf -0.1423 0.0917 -1.5527 0.1205 -0.3220 0.0373
#> var1.var2conf.perf 0.3167 0.0847 3.7393 0.0002 0.1507 0.4827 ***
#> var1.var2acog.asom 0.5671 0.0367 15.4640 <.0001 0.4953 0.6390 ***
#> var1.var2acog.conf -0.4888 0.0509 -9.6048 <.0001 -0.5886 -0.3891 ***
#> var1.var2asom.conf -0.4750 0.0506 -9.3901 <.0001 -0.5741 -0.3758 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### restructure estimated mean correlations into a 4x4 matrix
R <- vec2mat(coef(res))
rownames(R) <- colnames(R) <- c("perf", "acog", "asom", "conf")
round(R, digits=3)
#> perf acog asom conf
#> perf 1.000 -0.060 -0.142 0.317
#> acog -0.060 1.000 0.567 -0.489
#> asom -0.142 0.567 1.000 -0.475
#> conf 0.317 -0.489 -0.475 1.000
### check that order in vcov(res) corresponds to order in R
round(vcov(res), digits=4)
#> var1.var2acog.perf var1.var2asom.perf var1.var2conf.perf var1.var2acog.asom
#> var1.var2acog.perf 0.0198 0.0115 -0.0069 0.0017
#> var1.var2asom.perf 0.0115 0.0084 -0.0043 0.0009
#> var1.var2conf.perf -0.0069 -0.0043 0.0072 -0.0017
#> var1.var2acog.asom 0.0017 0.0009 -0.0017 0.0013
#> var1.var2acog.conf 0.0004 -0.0002 0.0023 -0.0009
#> var1.var2asom.conf 0.0018 0.0010 -0.0004 -0.0004
#> var1.var2acog.conf var1.var2asom.conf
#> var1.var2acog.perf 0.0004 0.0018
#> var1.var2asom.perf -0.0002 0.0010
#> var1.var2conf.perf 0.0023 -0.0004
#> var1.var2acog.asom -0.0009 -0.0004
#> var1.var2acog.conf 0.0026 0.0011
#> var1.var2asom.conf 0.0011 0.0026
### fit regression model with 'perf' as outcome and 'acog', 'asom', and 'conf' as predictors
matreg(1, 2:4, R=R, V=vcov(res))
#>
#> estimate se zval pval ci.lb ci.ub
#> acog 0.1482 0.1566 0.9465 0.3439 -0.1587 0.4550
#> asom -0.0536 0.0768 -0.6979 0.4852 -0.2043 0.0970
#> conf 0.3637 0.0910 3.9985 <.0001 0.1854 0.5419 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### can also specify variable names
matreg("perf", c("acog","asom","conf"), R=R, V=vcov(res))
#>
#> estimate se zval pval ci.lb ci.ub
#> acog 0.1482 0.1566 0.9465 0.3439 -0.1587 0.4550
#> asom -0.0536 0.0768 -0.6979 0.4852 -0.2043 0.0970
#> conf 0.3637 0.0910 3.9985 <.0001 0.1854 0.5419 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### repeat the above but with r-to-z transformed correlations
dat <- dat.craft2003
tmp <- rcalc(ri ~ var1 + var2 | study, ni=ni, data=dat, rtoz=TRUE)
V <- tmp$V
dat <- tmp$dat
dat$var1.var2 <- factor(dat$var1.var2,
levels=c("acog.perf", "asom.perf", "conf.perf", "acog.asom", "acog.conf", "asom.conf"))
res <- rma.mv(yi, V, mods = ~ 0 + var1.var2, random = ~ var1.var2 | study, struct="UN", data=dat)
#> Warning: 9 rows with NAs omitted from model fitting.
R <- vec2mat(coef(res))
rownames(R) <- colnames(R) <- c("perf", "acog", "asom", "conf")
matreg(1, 2:4, R=R, V=vcov(res), ztor=TRUE)
#>
#> estimate se zval pval ci.lb ci.ub
#> acog 0.1362 0.1697 0.8023 0.4224 -0.1965 0.4688
#> asom -0.0678 0.0761 -0.8900 0.3735 -0.2170 0.0814
#> conf 0.3665 0.0934 3.9248 <.0001 0.1835 0.5496 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
############################################################################
### a different example based on van Houwelingen et al. (2002)
### create dataset in long format
dat.long <- to.long(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg,
data=dat.colditz1994, append=FALSE)
dat.long <- escalc(measure="PLO", xi=out1, mi=out2, data=dat.long)
dat.long$group <- factor(dat.long$group, levels=c(2,1), labels=c("con","exp"))
dat.long
#>
#> study group out1 out2 yi vi
#> 1 1 exp 4 119 -3.3928 0.2584
#> 2 1 con 11 128 -2.4541 0.0987
#> 3 2 exp 6 300 -3.9120 0.1700
#> 4 2 con 29 274 -2.2458 0.0381
#> 5 3 exp 3 228 -4.3307 0.3377
#> 6 3 con 11 209 -2.9444 0.0957
#> 7 4 exp 62 13536 -5.3860 0.0162
#> 8 4 con 248 12619 -3.9295 0.0041
#> 9 5 exp 33 5036 -5.0279 0.0305
#> 10 5 con 47 5761 -4.8087 0.0215
#> 11 6 exp 180 1361 -2.0230 0.0063
#> 12 6 con 372 1079 -1.0649 0.0036
#> 13 7 exp 8 2537 -5.7593 0.1254
#> 14 7 con 10 619 -4.1255 0.1016
#> 15 8 exp 505 87886 -5.1592 0.0020
#> 16 8 con 499 87892 -5.1713 0.0020
#> 17 9 exp 29 7470 -5.5514 0.0346
#> 18 9 con 45 7232 -5.0796 0.0224
#> 19 10 exp 17 1699 -4.6046 0.0594
#> 20 10 con 65 1600 -3.2034 0.0160
#> 21 11 exp 186 50448 -5.6030 0.0054
#> 22 11 con 141 27197 -5.2621 0.0071
#> 23 12 exp 5 2493 -6.2118 0.2004
#> 24 12 con 3 2338 -6.6584 0.3338
#> 25 13 exp 27 16886 -6.4384 0.0371
#> 26 13 con 29 17825 -6.4211 0.0345
#>
### fit bivariate model
res <- rma.mv(yi, vi, mods = ~ 0 + group, random = ~ group | study, struct="UN",
data=dat.long, method="ML")
res
#>
#> Multivariate Meta-Analysis Model (k = 26; method: ML)
#>
#> Variance Components:
#>
#> outer factor: study (nlvls = 13)
#> inner factor: group (nlvls = 2)
#>
#> estim sqrt k.lvl fixed level
#> tau^2.1 2.4073 1.5516 13 no con
#> tau^2.2 1.4314 1.1964 13 no exp
#>
#> rho.con rho.exp con exp
#> con 1 - 13
#> exp 0.9467 1 no -
#>
#> Test for Residual Heterogeneity:
#> QE(df = 24) = 5270.3863, p-val < .0001
#>
#> Test of Moderators (coefficients 1:2):
#> QM(df = 2) = 292.4633, p-val < .0001
#>
#> Model Results:
#>
#> estimate se zval pval ci.lb ci.ub
#> groupcon -4.0960 0.4347 -9.4226 <.0001 -4.9480 -3.2440 ***
#> groupexp -4.8337 0.3396 -14.2329 <.0001 -5.4994 -4.1681 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### regression of log(odds)_exp on log(odds)_con
matreg(y=2, x=1, R=res$G, cov=TRUE, means=coef(res), n=res$g.levels.comb.k)
#>
#> estimate se tval df pval ci.lb ci.ub
#> intrcpt -1.8437 0.3265 -5.6477 11 0.0001 -2.5623 -1.1252 ***
#> con 0.7300 0.0749 9.7467 11 <.0001 0.5651 0.8948 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### but the SE of the 'con' coefficient is not computed correctly, since we treat res$G above as if
### it was a var-cov matrix computed from raw data based on res$g.levels.comb.k (= 13) data points
### fit bivariate model and get the var-cov matrix of the estimates in res$G
res <- rma.mv(yi, vi, mods = ~ 0 + group, random = ~ group | study, struct="UN",
data=dat.long, method="ML", cvvc="varcov", control=list(nearpd=TRUE))
### now use res$vvc as the var-cov matrix of the estimates in res$G
matreg(y=2, x=1, R=res$G, cov=TRUE, means=coef(res), V=res$vvc)
#>
#> estimate se zval pval ci.lb ci.ub
#> intrcpt -1.8437 0.3548 -5.1967 <.0001 -2.5391 -1.1484 ***
#> con 0.7300 0.0866 8.4276 <.0001 0.5602 0.8998 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
############################################################################