Function to create scatter plots / bubble plots based on meta-regression models.

regplot(x, ...)

# S3 method for rma
regplot(x, mod, pred=TRUE, ci=TRUE, pi=FALSE, shade=TRUE,
xlim, ylim, predlim, olim, xlab, ylab, at, digits=2L,
transf, atransf, targs, level=x\$level,
pch=21, psize, plim=c(0.5,3), col="black", bg="darkgray",
grid=FALSE, refline, label=FALSE, offset=c(1,1), labsize=1,
lcol, lwd, lty, legend=FALSE, xvals, ...)

## Arguments

x an object of class "rma.uni", "rma.mv", or "rma.glmm" including one or multiple moderators. either a scalar to specify the position of the moderator variable in the model or a character string to specify the name of the moderator variable. logical to indicate whether the (marginal) regression line based on the moderator should be added to the plot (the default is TRUE). Can also be an object from predict.rma. See ‘Details’. logical to indicate whether the corresponding confidence interval bounds should be added to the plot (the default is TRUE). logical to indicate whether the corresponding prediction interval bounds should be added to the plot (the default is FALSE). logical to indicate whether the confidence/prediction interval regions should be shaded (the default is TRUE). Can also be a two-element character vector to specify the colors for shading the confidence and prediction interval regions (if shading only the former, a single color can also be specified). x-axis limits. If unspecified, the function tries to set the x-axis limits to some sensible values. y-axis limits. If unspecified, the function tries to set the y-axis limits to some sensible values. optional argument to specify the limits of the (marginal) regression line. If unspecified, the limits are based on the range of the moderator variable. optional argument to specify observation/outcome limits. If unspecified, no limits are used. title for the x-axis. If unspecified, the function tries to set an appropriate axis title. title for the y-axis. If unspecified, the function tries to set an appropriate axis title. position of the y-axis tick marks and corresponding labels. If unspecified, the function tries to set the tick mark positions/labels to some sensible values. integer to specify the number of decimal places to which the tick mark labels of the y-axis should be rounded. When specifying an integer (e.g., 2L), trailing zeros after the decimal mark are dropped for the y-axis labels. When specifying a numeric value (e.g., 2), trailing zeros are retained. optional argument to specify a function that should be used to transform the observed outcomes, predicted values, and confidence/prediction interval bounds (e.g., transf=exp; see also transf). If unspecified, no transformation is used. optional argument to specify a function that should be used to transform the y-axis labels (e.g., atransf=exp; see also transf). If unspecified, no transformation is used. optional arguments needed by the function specified via transf or atransf. numeric value between 0 and 100 to specify the confidence/prediction interval level (the default is to take the value from the object). plotting symbol to use for the observed outcomes. By default, a filled circle is used. Can also be a vector of values. See points for other options. optional numeric value to specify the point sizes for the observed outcomes. If unspecified, the point sizes are a function of the model weights. Can also be a vector of values. Can also be a character string (either "seinv" or "vinv") to make the point sizes proportional to the inverse standard errors or inverse sampling variances. numeric vector of length 2 to scale the point sizes (ignored when a numeric value or vector is specified for psize). See ‘Details’. character string to specify the name of a color to use for plotting the observed outcomes (the default is "black"). Can also be a vector of color names. character string to specify the name of a background color for open plot symbols (the default is "darkgray"). Can also be a vector of color names. logical to specify whether a grid should be added to the plot. Can also be a color name. optional numeric value to specify the location of a horizontal reference line that should be added to the plot. argument to control the labeling of the points (the default is FALSE). See ‘Details’. argument to control the distance between the points and the corresponding labels. See ‘Details’. numeric value to control the size of the labels. optional vector of (up to) four elements to specify the color of the regression line, of the confidence interval bounds, of the prediction interval bounds, and of the horizontal reference line. optional vector of (up to) four elements to specify the line type of the regression line, of the confidence interval bounds, of the prediction interval bounds, and of the horizontal reference line. optional vector of (up to) four elements to specify the line width of the regression line, of the confidence interval bounds, of the prediction interval bounds, and of the horizontal reference line. logical to indicate whether a legend should be added to the plot (the default is FALSE). Can also be a keyword to indicate the position of the legend (see legend). optional numeric vector to specify the values of the moderator for which predicted values should be computed. Needs to be specified when passing an object from predict.rma to the pred argument. See ‘Details’. other arguments.

## Details

The function draws a scatter plot of the values of a moderator variable in a meta-regression model (on the x-axis) against the observed effect sizes or outcomes (on the y-axis). The regression line from the model (with corresponding confidence interval bounds) is added to the plot by default. These types of plots are also often referred to as ‘bubble plots’ as the points are typically drawn in different sizes to reflect their precision or weight in the model.

By default (i.e., when psize is not specified), the size of the points is a function of the square root of the model weights. This way, their area is proportional to the the weights. However, the point sizes are rescaled so that the smallest point size is plim[1] and the largest point size is plim[2]. As a result, their relative sizes (i.e., areas) no longer exactly correspond to their relative weights. If exactly relative point sizes are desired, one can set plim[2] to NA, in which case the points are rescaled so that the smallest point size corresponds to plim[1] and all other points are scaled accordingly. As a result, the largest point may be very large. Alternatively, one can set plim[1] to NA, in which case the points are rescaled so that the largest point size corresponds to plim[2] and all other points are scaled accordingly. As a result, the smallest point may be very small. To avoid the latter, one can also set plim[3], which enforces a minimal point size.

One can also set psize to a scalar (e.g., psize=1) to avoid that the points are drawn in different sizes. One can also specify the point sizes manually by passing a vector of the appropriate length to psize. Finally, one can also set psize to either "seinv" or "vinv" to make the point sizes proportional to the inverse standard errors or inverse sampling variances.

For a model with more than one predictor, the regression line reflects the ‘marginal’ relationship between the chosen moderator and the effect sizes or outcomes (i.e., all other moderators except the one being plotted are held constant at their means).

With the label argument, one can control whether points in the plot will be labeled. If label="all" (or label=TRUE), all points in the plot will be labeled. If label="ciout" or label="piout", points falling outside of the confidence/prediction interval will be labeled. Alternatively, one can set this argument to a logical or numeric vector to specify which points should be labeled. The labels are placed above the points when they fall above the regression line and otherwise below. With the offset argument, one can adjust the distance between the labels and the corresponding points. This can either be a single numeric value, which is used as a multiplicative factor for the point sizes (so that the distance between labels and points is larger for larger points) or a numeric vector with two values, where the first is used as an additive factor independent of the point sizes and the second again as a multiplicative factor for the point sizes. The values are given as percentages of the y-axis range. It may take some trial and error to find two values for the offset argument so that the labels are placed right next to the boundary of the points. With labsize, one can control the size of the labels.

One can also pass an object from predict.rma to the pred argument. This can be useful when the meta-regression model reflects a more complex relationship between the moderator variable and the effect sizes or outcomes (e.g., when using polynomials or splines) or when the model involves interactions. In this case, one also needs to specify the xvals argument. See ‘Examples’.

## Note

For certain types of models, it may not be possible to draw the prediction interval bounds (if this is the case, a warning will be issued).

When specifying vectors for pch, psize, col, bg, and/or label, the variables specified are assumed to be of the same length as the data passed to the model fitting function. Any subsetting and removal of studies with missing values is automatically applied to the variables specified via these arguments.

If the outcome measure used for creating the plot is bounded (e.g., correlations are bounded between -1 and +1, proportions are bounded between 0 and 1), one can use the olim argument to enforce those limits (the observed outcomes and confidence/prediction intervals cannot exceed those bounds then).

## Value

A data frame with components:

slab

the study labels

ids

the study ids

xi

the x-axis coordinates of the points that were plotted.

yi

the y-axis coordinates of the points that were plotted.

pch

the plotting symbols of the points that were plotted.

psize

the point sizes of the points that were plotted.

col

the colors of the points that were plotted.

bg

the background colors of the points that were plotted.

label

logical vector indicating whether a point was labeled or not.

Note that the data frame is returned invisibly.

## Author

Wolfgang Viechtbauer wvb@metafor-project.org https://www.metafor-project.org

## References

Thompson, S. G., & Higgins, J. P. T. (2002). How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine, 21(11), 1559--1573. https://doi.org/10.1002/sim.1187

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1--48. https://doi.org/10.18637/jss.v036.i03

rma.uni, rma.glmm, rma.mv

## Examples

### copy BCG vaccine data into 'dat'
dat <- dat.bcg

### calculate log risk ratios and corresponding sampling variances
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)

### fit mixed-effects model with absolute latitude as a moderator
res <- rma(yi, vi, mods = ~ ablat, data=dat)
res
#>
#> Mixed-Effects Model (k = 13; tau^2 estimator: REML)
#>
#> tau^2 (estimated amount of residual heterogeneity):     0.0764 (SE = 0.0591)
#> tau (square root of estimated tau^2 value):             0.2763
#> I^2 (residual heterogeneity / unaccounted variability): 68.39%
#> H^2 (unaccounted variability / sampling variability):   3.16
#> R^2 (amount of heterogeneity accounted for):            75.62%
#>
#> Test for Residual Heterogeneity:
#> QE(df = 11) = 30.7331, p-val = 0.0012
#>
#> Test of Moderators (coefficient 2):
#> QM(df = 1) = 16.3571, p-val < .0001
#>
#> Model Results:
#>
#>          estimate      se     zval    pval    ci.lb    ci.ub
#> intrcpt    0.2515  0.2491   1.0095  0.3127  -0.2368   0.7397
#> ablat     -0.0291  0.0072  -4.0444  <.0001  -0.0432  -0.0150  ***
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### draw plot
regplot(res, mod="ablat", xlab="Absolute Latitude")

### adjust x-axis limits and back-transform to risk ratios
regplot(res, mod="ablat", xlab="Absolute Latitude", xlim=c(0,60), transf=exp)

### also extend the prediction limits for the regression line
regplot(res, mod="ablat", xlab="Absolute Latitude", xlim=c(0,60), predlim=c(0,60), transf=exp)

### add the prediction interval to the plot, add a reference line at 1, and add a legend
regplot(res, mod="ablat", pi=TRUE, xlab="Absolute Latitude",
xlim=c(0,60), predlim=c(0,60), transf=exp, refline=1, legend=TRUE)

### label points outside of the prediction interval
regplot(res, mod="ablat", pi=TRUE, xlab="Absolute Latitude",
xlim=c(0,60), predlim=c(0,60), transf=exp, refline=1, legend=TRUE,
label="piout", labsize=0.8)

### fit mixed-effects model with absolute latitude and publication year as moderators
res <- rma(yi, vi, mods = ~ ablat + year, data=dat)
res
#>
#> Mixed-Effects Model (k = 13; tau^2 estimator: REML)
#>
#> tau^2 (estimated amount of residual heterogeneity):     0.1108 (SE = 0.0845)
#> tau (square root of estimated tau^2 value):             0.3328
#> I^2 (residual heterogeneity / unaccounted variability): 71.98%
#> H^2 (unaccounted variability / sampling variability):   3.57
#> R^2 (amount of heterogeneity accounted for):            64.63%
#>
#> Test for Residual Heterogeneity:
#> QE(df = 10) = 28.3251, p-val = 0.0016
#>
#> Test of Moderators (coefficients 2:3):
#> QM(df = 2) = 12.2043, p-val = 0.0022
#>
#> Model Results:
#>
#>          estimate       se     zval    pval     ci.lb    ci.ub
#> intrcpt   -3.5455  29.0959  -0.1219  0.9030  -60.5724  53.4814
#> ablat     -0.0280   0.0102  -2.7371  0.0062   -0.0481  -0.0080  **
#> year       0.0019   0.0147   0.1299  0.8966   -0.0269   0.0307
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### plot the marginal relationships
regplot(res, mod="ablat", xlab="Absolute Latitude")
regplot(res, mod="year",  xlab="Publication Year")

### fit a quadratic polynomial meta-regression model
res <- rma(yi, vi, mods = ~ ablat + I(ablat^2), data=dat)
res
#>
#> Mixed-Effects Model (k = 13; tau^2 estimator: REML)
#>
#> tau^2 (estimated amount of residual heterogeneity):     0.0806 (SE = 0.0658)
#> tau (square root of estimated tau^2 value):             0.2840
#> I^2 (residual heterogeneity / unaccounted variability): 66.62%
#> H^2 (unaccounted variability / sampling variability):   3.00
#> R^2 (amount of heterogeneity accounted for):            74.26%
#>
#> Test for Residual Heterogeneity:
#> QE(df = 10) = 28.4961, p-val = 0.0015
#>
#> Test of Moderators (coefficients 2:3):
#> QM(df = 2) = 16.9151, p-val = 0.0002
#>
#> Model Results:
#>
#>             estimate      se     zval    pval    ci.lb   ci.ub
#> intrcpt      -0.3889  0.6285  -0.6188  0.5360  -1.6207  0.8429
#> ablat         0.0218  0.0464   0.4699  0.6385  -0.0692  0.1128
#> I(ablat^2)   -0.0008  0.0007  -1.1100  0.2670  -0.0022  0.0006
#>
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
### compute predicted values using predict()
xs <- seq(0,60,length=601)
tmp <- predict(res, newmods=cbind(xs, xs^2))

### can now pass these results to the 'pred' argument (and have to specify xvals accordingly)
regplot(res, mod="ablat", pred=tmp, xlab="Absolute Latitude", xlim=c(0,60), xvals=xs)

### back-transform to risk ratios and add reference line
regplot(res, mod="ablat", pred=tmp, xlab="Absolute Latitude", xlim=c(0,60), xvals=xs,
transf=exp, refline=1)