Aggregate Data to the Subject Level

Function that aggregates a dataset to the subject level.

aggreg(data, id, vars, grep=FALSE, na.rm=TRUE)

Arguments

data: data frame to aggregate.
id: argument to specify a subject id variable.
vars: optional character vector (giving the names of the variables to aggregate) or a numeric vector (giving the position of the columns in the data frame corresponding to the variables).
grep: logical indicating whether variable names should be matched using grep (default is FALSE).
na.rm: logical indicating whether missing values should be removed before aggregating the variables (default is TRUE).

Details

The function aggregates a dataset in the long format to the subject level. For numeric, integer, and logical variables, the subject-level means are computed. For factors and character variables, the first (non-missing) value is returned.

Value

A data frame.

Author

Wolfgang Viechtbauer wvb@wvbauer.com

Examples

# illustrative dataset
dat <- data.frame(subj=rep(1:4, each=5),
                  sex = rep(c("male", "female"), each=2*5),
                  obs = 1:5,
                  age = rep(c(20,31,27,22), each=5),
                  stress = c(2,3,NA,4,2, 3,3,NA,3,NA, 1,1,2,6,4, 1,2,1,3,1))
dat
#>    subj    sex obs age stress
#> 1     1   male   1  20      2
#> 2     1   male   2  20      3
#> 3     1   male   3  20     NA
#> 4     1   male   4  20      4
#> 5     1   male   5  20      2
#> 6     2   male   1  31      3
#> 7     2   male   2  31      3
#> 8     2   male   3  31     NA
#> 9     2   male   4  31      3
#> 10    2   male   5  31     NA
#> 11    3 female   1  27      1
#> 12    3 female   2  27      1
#> 13    3 female   3  27      2
#> 14    3 female   4  27      6
#> 15    3 female   5  27      4
#> 16    4 female   1  22      1
#> 17    4 female   2  22      2
#> 18    4 female   3  22      1
#> 19    4 female   4  22      3
#> 20    4 female   5  22      1

# aggregate the dataset
aggreg(dat, subj)
#>   subj    sex obs age stress
#> 1    1   male   3  20   2.75
#> 2    2   male   3  31   3.00
#> 3    3 female   3  27   2.80
#> 4    4 female   3  22   1.60

# aggregate the dataset for selected variables
aggreg(dat, subj, vars=c("subj","stress"))
#>   subj stress
#> 1    1   2.75
#> 2    2   3.00
#> 3    3   2.80
#> 4    4   1.60

# aggregate the dataset for selected variables
aggreg(dat, subj, vars=1:2)
#>   subj    sex
#> 1    1   male
#> 2    2   male
#> 3    3 female
#> 4    4 female