dat.vanhowe1999.Rd
Results from 33 studies examining the association between male circumcision and HIV infection.
dat.vanhowe1999
The data frame contains the following columns:
study | character | study author |
category | character | study type (high-risk group, partner study, or population survey) |
non.pos | numeric | number of non-circumcised HIV positive cases |
non.neg | numeric | number of non-circumcised HIV negative cases |
cir.pos | numeric | number of circumcised HIV positive cases |
cir.neg | numeric | number of circumcised HIV negative cases |
The 33 studies provide data in terms of \(2 \times 2\) tables in the form:
HIV positive | HIV negative | |
non-circumcised | non.pos | non.neg |
circumcised | cir.pos | cir.neg |
The goal of the meta-analysis was to examine if the risk of an HIV infection differs between non-circumcised versus circumcised men.
The dataset is interesting because it can be used to illustrate the difference between naively pooling results by summing up the counts across studies and then computing the odds ratio based on the aggregated table (as was done by Van Howe, 1999) and conducting a proper meta-analysis (as illustrated by O'Farrell & Egger, 2000). In fact, a proper meta-analysis shows that the HIV infection risk is on average higher in non-circumcised men, which is the opposite of what the naive pooling approach yields (which makes this an illustration of Simpson's paradox).
Van Howe, R. S. (1999). Circumcision and HIV infection: Review of the literature and meta-analysis. International Journal of STD & AIDS, 10(1), 8–16. https://doi.org/10.1258/0956462991913015
O'Farrell, N., & Egger, M. (2000). Circumcision in men and the prevention of HIV infection: A 'meta-analysis' revisited. International Journal of STD & AIDS, 11(3), 137–142. https://doi.org/10.1258/0956462001915480
medicine, epidemiology, odds ratios
### copy data into 'dat' and examine data
dat <- dat.vanhowe1999
dat
#> study category non.pos non.neg cir.pos cir.neg
#> 1 Bwayo 1 high-risk group 92 86 160 612
#> 2 Bwayo 2 high-risk group 22 46 37 200
#> 3 Kreiss high-risk group 59 18 254 168
#> 4 Hira high-risk group 418 172 10 10
#> 5 Cameron high-risk group 18 61 6 208
#> 6 Pepin high-risk group 5 42 13 243
#> 7 Greenblatt high-risk group 11 28 8 68
#> 8 Diallo high-risk group 38 46 212 873
#> 9 Simonsen high-risk group 17 70 21 232
#> 10 Tyndall high-risk group 85 93 105 527
#> 11 Nasio high-risk group 86 78 137 580
#> 12 Mehendal high-risk group 837 3411 38 253
#> 13 Bollinger high-risk group 50 241 1 14
#> 14 Chiasson high-risk group 36 797 14 542
#> 15 Sassan high-risk group 75 18 415 221
#> 16 Hunter partner study 43 330 165 3765
#> 17 Carael partner study 90 105 34 45
#> 18 Chao partner study 442 4844 75 232
#> 19 Moss partner study 24 16 12 17
#> 20 Allen partner study 275 612 132 324
#> 21 Sedlin partner study 32 26 33 40
#> 22 Konde-Luc partner study 153 1516 6 127
#> 23 Barongo population survey 55 1356 42 642
#> 24 Grosskurth population survey 158 4604 61 1026
#> 25 Van de Perre population survey 46 224 6 26
#> 26 Seed population survey 171 422 51 192
#> 27 Malamba population survey 111 114 21 47
#> 28 Quigley population survey 101 272 48 121
#> 29 Urassa 1 population survey 56 1301 42 600
#> 30 Urassa 2 population survey 105 2040 32 426
#> 31 Urassa 3 population survey 38 309 19 158
#> 32 Urassa 4 population survey 112 716 54 692
#> 33 Urassa 5 population survey 101 365 48 136
### load metafor package
library(metafor)
### naive pooling by summing up the counts within categories and then
### computing the odds ratios and corresponding confidence intervals
cat1 <- with(dat[dat$category=="high-risk group",],
escalc(measure="OR", ai=sum(non.pos), bi=sum(non.neg), ci=sum(cir.pos), di=sum(cir.neg)))
cat2 <- with(dat[dat$category=="partner study",],
escalc(measure="OR", ai=sum(non.pos), bi=sum(non.neg), ci=sum(cir.pos), di=sum(cir.neg)))
cat3 <- with(dat[dat$category=="population survey",],
escalc(measure="OR", ai=sum(non.pos), bi=sum(non.neg), ci=sum(cir.pos), di=sum(cir.neg)))
summary(cat1, transf=exp, digits=2)
#>
#> yi ci.lb ci.ub
#> 1 1.18 1.09 1.28
#>
summary(cat2, transf=exp, digits=2)
#>
#> yi ci.lb ci.ub
#> 1 1.42 1.26 1.59
#>
summary(cat3, transf=exp, digits=2)
#>
#> yi ci.lb ci.ub
#> 1 0.86 0.77 0.97
#>
### naive pooling across all studies
all <- escalc(measure="OR", ai=sum(dat$non.pos), bi=sum(dat$non.neg),
ci=sum(dat$cir.pos), di=sum(dat$cir.neg))
summary(all, transf=exp, digits=2)
#>
#> yi ci.lb ci.ub
#> 1 0.94 0.89 0.99
#>
### calculate log odds ratios and corresponding sampling variances
dat <- escalc(measure="OR", ai=non.pos, bi=non.neg, ci=cir.pos, di=cir.neg, data=dat)
dat
#>
#> study category non.pos non.neg cir.pos cir.neg yi vi
#> 1 Bwayo 1 high-risk group 92 86 160 612 1.4090 0.0304
#> 2 Bwayo 2 high-risk group 22 46 37 200 0.9498 0.0992
#> 3 Kreiss high-risk group 59 18 254 168 0.7738 0.0824
#> 4 Hira high-risk group 418 172 10 10 0.8880 0.2082
#> 5 Cameron high-risk group 18 61 6 208 2.3253 0.2434
#> 6 Pepin high-risk group 5 42 13 243 0.7999 0.3048
#> 7 Greenblatt high-risk group 11 28 8 68 1.2058 0.2663
#> 8 Diallo high-risk group 38 46 212 873 1.2243 0.0539
#> 9 Simonsen high-risk group 17 70 21 232 0.9869 0.1250
#> 10 Tyndall high-risk group 85 93 105 527 1.5233 0.0339
#> 11 Nasio high-risk group 86 78 137 580 1.5407 0.0335
#> 12 Mehendal high-risk group 837 3411 38 253 0.4909 0.0318
#> 13 Bollinger high-risk group 50 241 1 14 1.0663 1.0956
#> 14 Chiasson high-risk group 36 797 14 542 0.5589 0.1023
#> 15 Sassan high-risk group 75 18 415 221 0.7970 0.0758
#> 16 Hunter partner study 43 330 165 3765 1.0897 0.0326
#> 17 Carael partner study 90 105 34 45 0.1262 0.0723
#> 18 Chao partner study 442 4844 75 232 -1.2649 0.0201
#> 19 Moss partner study 24 16 12 17 0.7538 0.2463
#> 20 Allen partner study 275 612 132 324 0.0980 0.0159
#> 21 Sedlin partner study 32 26 33 40 0.4000 0.1250
#> 22 Konde-Luc partner study 153 1516 6 127 0.7590 0.1817
#> 23 Barongo population survey 55 1356 42 642 -0.4780 0.0443
#> 24 Grosskurth population survey 158 4604 61 1026 -0.5495 0.0239
#> 25 Van de Perre population survey 46 224 6 26 -0.1167 0.2313
#> 26 Seed population survey 171 422 51 192 0.4223 0.0330
#> 27 Malamba population survey 111 114 21 47 0.7790 0.0867
#> 28 Quigley population survey 101 272 48 121 -0.0661 0.0427
#> 29 Urassa 1 population survey 56 1301 42 600 -0.4863 0.0441
#> 30 Urassa 2 population survey 105 2040 32 426 -0.3780 0.0436
#> 31 Urassa 3 population survey 38 309 19 158 0.0224 0.0885
#> 32 Urassa 4 population survey 112 716 54 692 0.6954 0.0303
#> 33 Urassa 5 population survey 101 365 48 136 -0.2433 0.0408
#>
### random-effects model
res <- rma(yi, vi, data=dat, method="DL")
res
#>
#> Random-Effects Model (k = 33; tau^2 estimator: DL)
#>
#> tau^2 (estimated amount of total heterogeneity): 0.6397 (SE = 0.2103)
#> tau (square root of estimated tau^2 value): 0.7998
#> I^2 (total heterogeneity / total variability): 92.36%
#> H^2 (total variability / sampling variability): 13.09
#>
#> Test for Heterogeneity:
#> Q(df = 32) = 419.0087, p-val < .0001
#>
#> Model Results:
#>
#> estimate se zval pval ci.lb ci.ub
#> 0.5132 0.1499 3.4248 0.0006 0.2195 0.8069 ***
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
predict(res, transf=exp, digits=2)
#>
#> pred ci.lb ci.ub pi.lb pi.ub
#> 1.67 1.25 2.24 0.34 8.23
#>
### random-effects model within subgroups
res <- rma(yi, vi, data=dat, method="DL", subset=category=="high-risk group")
predict(res, transf=exp, digits=2)
#>
#> pred ci.lb ci.ub pi.lb pi.ub
#> 3.00 2.35 3.83 1.42 6.31
#>
res <- rma(yi, vi, data=dat, method="DL", subset=category=="partner study")
predict(res, transf=exp, digits=2)
#>
#> pred ci.lb ci.ub pi.lb pi.ub
#> 1.29 0.62 2.69 0.18 9.45
#>
res <- rma(yi, vi, data=dat, method="DL", subset=category=="population survey")
predict(res, transf=exp, digits=2)
#>
#> pred ci.lb ci.ub pi.lb pi.ub
#> 0.96 0.71 1.29 0.38 2.44
#>