Results from 126 articles that examined the so-called ‘generation effect’.

dat.mccurdy2020

Format

The data frame contains the following columns:

articlenumericarticle identifier
experimentcharacterexperiment (within article) identifier
samplenumericsample (within experiment) identifier
idnumericrow identifier
pairingnumericidentifier to indicate paired conditions within experiments
yinumericmean recall rate for the condition
vinumericcorresponding sampling variance
ninumericnumber of participants for the condition
stimulinumericnumber of stimuli for the condition
conditionfactorcondition (‘read’ or ‘generate’)
gen_difficultyfactorgeneration difficulty (‘low’ or ‘high’)
manip_typefactormanipulation type of the generate versus read condition (using a ‘within’ or ‘between’ subjects design)
present_stylefactorpresentation style (‘mixed’ or ‘pure’ list presentation)
word_statusfactorword status (‘words’, ‘non-words’, or ‘numbers’)
memory_testfactormemory test (‘recognition’, ‘cued recall’, or ‘free recall’)
memory_typefactormemory type (‘item’, ‘source’, ‘font color’, ‘font type’, ‘order’, ‘cue word’, ‘background color’, or ‘location’)
gen_constraintfactorgeneration constraint (‘low’, ‘medium’, or ‘high’)
learning_typefactorlearning type (‘incidental’ or ‘intentional’)
stimuli_relationfactorstimuli relation (‘semantic’, ‘category’, ‘antonym’, ‘synonym’, ‘rhyme’, ‘compound words’, ‘definitions’, or ‘unrelated’)
gen_modefactorgeneration mode (‘verbal/speaking’, ‘covert/thinking’, or ‘writing/typing’)
gen_taskfactorgeneration task (‘anagram’, ‘letter transposition’, ‘word fragment’, ‘sentence completion’, ‘word stem’, ‘calculation’, or ‘cue only’)
attentionfactorattention (‘divided’ or ‘full’)
pacingfactorpacing (‘self-paced’ or ‘timed’)
filler_taskfactorfiller task (‘yes’ or ‘no’)
age_grpfactorage group (‘younger’ or ‘older’ adults)
retention_delayfactorretention delay (‘immediate’, ‘short’, or ‘long’)

Details

The generation effect is the memory benefit for self-generated compared with read or experimenter-provided information (Jacoby, 1978; Slamecka & Graf, 1978). In a typical study, participants are presented with a list of stimuli (usually words or word pairs). For half of the stimuli, participants self-generate a target word (e.g., open–cl____), while for the other half, participants simply read an intact target word (e.g., above–below). On a later memory test for the target words, the common finding is that self-generated words are better remembered than read words (i.e., the generation effect).

Although several theories have been proposed to explain the generation effect, there is still some debate on the underlying memory mechanism(s) contributing to this phenomenon. The meta-analysis by McCurdy et al. (2020) translated various theories on the generation effect into hypotheses that could then be tested in moderator analyses based on a dataset containing 126 articles, 310 experiments, and 1653 mean recall estimates collected under various conditions.

Detailed explanations of the various variables coded (and how these can be used to test various hypotheses regarding the generation effect) can be found in the article. The most important variable is condition, which denotes whether a particular row of the dataset corresponds to the results of a ‘read’ or a ‘generate’ condition.

The data structure is quite complex. Articles may have reported the findings from multiple experiments involving one or multiple samples that were examined under various conditions. The pairing variable indicates which rows of the dataset represent a pairing of a read condition with one or multiple corresponding generate conditions within an experiment. A pairing may involve the same sample of subjects (when using a within-subjects design for comparing the conditions) or different samples (when using a between-subjects design).

Source

McCurdy, M. P., Viechtbauer, W., Sklenar, A. M., Frankenstein, A. N., & Leshikar, E. D. (2020). Theories of the generation effect and the impact of generation constraint: A meta-analytic review. Psychonomic Bulletin & Review, 27(6), 1139–1165. https://doi.org/10.3758/s13423-020-01762-3

References

Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4(6), 592–604. https://doi.org/10.1037/0278-7393.4.6.592

Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal of Verbal Learning and Verbal Behavior, 17(6), 649–668. https://doi.org/10.1016/S0022-5371(78)90393-6

Concepts

psychology, memory, proportions, raw means, multilevel models, cluster-robust inference

Examples

### copy data into 'dat' and examine data
dat <- dat.mccurdy2020
head(dat)
#> 
#>   article experiment sample id pairing     yi      vi ni stimuli condition gen_difficulty manip_type 
#> 1      12          1      1  1       1 0.8790 0.00068 12      20  generate           <NA>    between 
#> 2      12          1      2  2       1 0.7130 0.00140 12      20      read           <NA>    between 
#> 3      12          1      1  3       2 0.9060 0.00058 12      20  generate           <NA>    between 
#> 4      12          1      2  4       2 0.6840 0.00151 12      20      read           <NA>    between 
#> 5      12          1      1  5       3 0.8420 0.00084 12      20  generate           <NA>    between 
#> 6      12          1      2  6       3 0.7530 0.00124 12      20      read           <NA>    between 
#>   present_style word_status memory_test memory_type gen_constraint learning_type stimuli_relation 
#> 1          pure       words recognition        item         medium   intentional         semantic 
#> 2          pure       words recognition        item           <NA>   intentional         semantic 
#> 3          pure       words recognition        item           high   intentional         category 
#> 4          pure       words recognition        item           <NA>   intentional         category 
#> 5          pure       words recognition        item           high   intentional          antonym 
#> 6          pure       words recognition        item           <NA>   intentional          antonym 
#>          gen_mode  gen_task attention pacing filler_task age_grp retention_delay 
#> 1 verbal/speaking word stem      full   <NA>          no younger       immediate 
#> 2 verbal/speaking word stem      full   <NA>          no younger       immediate 
#> 3 verbal/speaking word stem      full   <NA>          no younger       immediate 
#> 4 verbal/speaking word stem      full   <NA>          no younger       immediate 
#> 5 verbal/speaking word stem      full   <NA>          no younger       immediate 
#> 6 verbal/speaking word stem      full   <NA>          no younger       immediate 
#> 

# \dontrun{

### load metafor package
library(metafor)

### fit multilevel mixed-effects meta-regression model
res <- rma.mv(yi, vi, mods = ~ condition,
              random = list(~ 1 | article/experiment/sample/id, ~ 1 | pairing),
              data=dat, sparse=TRUE, digits=3)
res
#> 
#> Multivariate Meta-Analysis Model (k = 1653; method: REML)
#> 
#> Variance Components:
#> 
#>            estim   sqrt  nlvls  fixed                        factor 
#> sigma^2.1  0.022  0.148    126     no                       article 
#> sigma^2.2  0.006  0.078    310     no            article/experiment 
#> sigma^2.3  0.000  0.000    582     no     article/experiment/sample 
#> sigma^2.4  0.006  0.080   1653     no  article/experiment/sample/id 
#> sigma^2.5  0.017  0.128    804     no                       pairing 
#> 
#> Test for Residual Heterogeneity:
#> QE(df = 1651) = 211160.207, p-val < .001
#> 
#> Test of Moderators (coefficient 2):
#> QM(df = 1) = 578.027, p-val < .001
#> 
#> Model Results:
#> 
#>                    estimate     se    zval   pval  ci.lb  ci.ub     ​ 
#> intrcpt               0.478  0.016  30.446  <.001  0.448  0.509  *** 
#> conditiongenerate     0.102  0.004  24.042  <.001  0.094  0.110  *** 
#> 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 

### proportion of total amount of heterogeneity due to each component
data.frame(source=res$s.names, sigma2=round(res$sigma2, 3),
   prop=round(res$sigma2 / sum(res$sigma2), 2))
#>                         source sigma2 prop
#> 1                      article  0.022 0.43
#> 2           article/experiment  0.006 0.12
#> 3    article/experiment/sample  0.000 0.00
#> 4 article/experiment/sample/id  0.006 0.13
#> 5                      pairing  0.017 0.33

### apply cluster-robust inference
sav <- robust(res, cluster=article)
sav
#> 
#> Multivariate Meta-Analysis Model (k = 1653; method: REML)
#> 
#> Variance Components:
#> 
#>            estim   sqrt  nlvls  fixed                        factor 
#> sigma^2.1  0.022  0.148    126     no                       article 
#> sigma^2.2  0.006  0.078    310     no            article/experiment 
#> sigma^2.3  0.000  0.000    582     no     article/experiment/sample 
#> sigma^2.4  0.006  0.080   1653     no  article/experiment/sample/id 
#> sigma^2.5  0.017  0.128    804     no                       pairing 
#> 
#> Test for Residual Heterogeneity:
#> QE(df = 1651) = 211160.207, p-val < .001
#> 
#> Number of estimates:   1653
#> Number of clusters:    126
#> Estimates per cluster: 2-48 (mean: 13.12, median: 9)
#> 
#> Test of Moderators (coefficient 2):¹
#> F(df1 = 1, df2 = 124) = 191.874, p-val < .001
#> 
#> Model Results:
#> 
#>                    estimate    se¹   tval¹  df¹  pval¹  ci.lb¹  ci.ub¹     ​ 
#> intrcpt               0.478  0.016  29.275  124  <.001   0.446   0.511  *** 
#> conditiongenerate     0.102  0.007  13.852  124  <.001   0.087   0.117  *** 
#> 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> 1) results based on cluster-robust inference (var-cov estimator: CR1,
#>    approx. t/F-tests and confidence intervals, dfs = residual method)
#> 

### estimated average recall rate in read and generate conditions
predict(sav, newmods = c(0,1), digits=3)
#> 
#>    pred    se ci.lb ci.ub pi.lb pi.ub 
#> 1 0.478 0.016 0.446 0.511 0.031 0.926 
#> 2 0.581 0.016 0.549 0.612 0.133 1.028 
#> 

### use methods from clubSandwich package
sav <- robust(res, cluster=article, clubSandwich=TRUE)
sav
#> 
#> Multivariate Meta-Analysis Model (k = 1653; method: REML)
#> 
#> Variance Components:
#> 
#>            estim   sqrt  nlvls  fixed                        factor 
#> sigma^2.1  0.022  0.148    126     no                       article 
#> sigma^2.2  0.006  0.078    310     no            article/experiment 
#> sigma^2.3  0.000  0.000    582     no     article/experiment/sample 
#> sigma^2.4  0.006  0.080   1653     no  article/experiment/sample/id 
#> sigma^2.5  0.017  0.128    804     no                       pairing 
#> 
#> Test for Residual Heterogeneity:
#> QE(df = 1651) = 211160.207, p-val < .001
#> 
#> Number of estimates:   1653
#> Number of clusters:    126
#> Estimates per cluster: 2-48 (mean: 13.12, median: 9)
#> 
#> Test of Moderators (coefficient 2):¹
#> F(df1 = 1, df2 = 74.7) = 192.517, p-val < .001
#> 
#> Model Results:
#> 
#>                    estimate    se¹   tval¹     df¹  pval¹  ci.lb¹  ci.ub¹     ​ 
#> intrcpt               0.478  0.016  29.382  120.17  <.001   0.446   0.511  *** 
#> conditiongenerate     0.102  0.007  13.875   74.70  <.001   0.087   0.117  *** 
#> 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> 1) results based on cluster-robust inference (var-cov estimator: CR2,
#>    approx. t/F-tests and confidence intervals, dfs = Satterthwaite method)
#> 

# }