Treated and control sample descriptives

This function shows the means and standard-deviations of several variables conditional on whether they are from the treated or the control group. The groups can further be split according to a pre/post variable. Results can be seamlessly be exported to Latex.

Usage

did_means(
  fml,
  base,
  treat_var,
  post_var,
  tex = FALSE,
  treat_dict,
  dict = getFixest_dict(),
  file,
  replace = FALSE,
  title,
  label,
  raw = FALSE,
  indiv,
  treat_first,
  prepostnames = c("Before", "After"),
  diff.inv = FALSE
)

Arguments

fml: Either a formula of the type var1 + ... + varN ~ treat or var1 + ... + varN ~ treat | post. Either a data.frame/matrix containing all the variables for which the means are to be computed (they must be numeric of course). Both the treatment and the post variables must contain only exactly two values. You can use a point to select all the variables of the data set: . ~ treat.
base: A data base containing all the variables in the formula fml.
treat_var: Only if argument fml is not a formula. The vector identifying the treated and the control observations (the vector can be of any type but must contain only two possible values). Must be of the same length as the data.
post_var: Only if argument fml is not a formula. The vector identifying the periods (pre/post) of the observations (the vector can be of any type but must contain only two possible values). The first value (in the sorted sense) of the vector is taken as the pre period. Must be of the same length as the data.
tex: Should the result be displayed in Latex? Default is FALSE. Automatically set to TRUE if the table is to be saved in a file using the argument file.
treat_dict: A character vector of length two. What are the names of the treated and the control? This should be a dictionary: e.g. c("1"="Treated", "0" = "Control").
dict: A named character vector. A dictionary between the variables names and an alias. For instance dict=c("x"="Inflation Rate") would replace the variable name x by “Inflation Rate”.
file: A file path. If given, the table is written in Latex into this file.
replace: Default is TRUE, which means that when the table is exported, the existing file is not erased.
title: Character string giving the Latex title of the table. (Only if exported.)
label: Character string giving the Latex label of the table. (Only if exported.)
raw: Logical, default is FALSE. If TRUE, it returns the information without formatting.
indiv: Either the variable name of individual identifiers, a one sided formula, or a vector. If the data is that of a panel, this can be used to track the number of individuals per group.
treat_first: Which value of the 'treatment' vector should appear on the left? By default the max value appears first (e.g. if the treatment variable is a 0/1 vector, 1 appears first).
prepostnames: Only if there is a 'post' variable. The names of the pre and post periods to be displayed in Latex. Default is c("Before", "After").
diff.inv: Logical, default to FALSE. Whether to inverse the difference.

Value

It returns a data.frame or a Latex table with the conditional means and statistical differences between the groups.

Details

By default, when the user tries to apply this function to nun-numeric variables, an error is raised. The exception is when the all variables are selected with the dot (like in . ~ treat. In this case, non-numeric variables are automatically omitted (with a message).

NAs are removed automatically: if the data contains NAs an information message will be prompted. First all observations containing NAs relating to the treatment or post variables are removed. Then if there are still NAs for the variables, they are excluded separately for each variable, and a new message detailing the NA breakup is prompted.

Examples


# Playing around with the DiD data
data(base_did)

# means of treat/control
did_means(y+x1+period~treat, base_did)
#>           vars    cond: 1      cond: 0 Difference t-stat
#> 1            y    3.3 (6)     0.68 (5)       2.64   7.83
#> 2           x1 0.13 (3.1) -0.066 (2.8)      0.199    1.1
#> 3       period  5.5 (2.9)    5.5 (2.9)          0      0
#> 4 Observations        550          530                  

# same but inverting the difference
did_means(y+x1+period~treat, base_did, diff.inv = TRUE)
#>           vars    cond: 1      cond: 0 Difference t-stat
#> 1            y    3.3 (6)     0.68 (5)      -2.64  -7.83
#> 2           x1 0.13 (3.1) -0.066 (2.8)     -0.199   -1.1
#> 3       period  5.5 (2.9)    5.5 (2.9)          0      0
#> 4 Observations        550          530                  

# now treat/control, before/after
did_means(y+x1+period~treat|post, base_did)
#>           vars    cond: 1     cond: 0 Difference t-stat     cond: 1     cond: 0
#> 1            y 0.47 (5.1)    0.32 (5)      0.142  0.326   6.2 (5.5)       1 (5)
#> 2           x1 0.17 (3.1) 0.046 (2.9)      0.125  0.487 0.095 (3.1) -0.18 (2.8)
#> 3       period    3 (1.4)     3 (1.4)          0      0     8 (1.4)     8 (1.4)
#> 4 Observations        275         265                           275         265
#>   Difference t-stat
#> 1       5.14   11.4
#> 2      0.272   1.07
#> 3          0      0
#> 4                  

# same but with a new line giving the number of unique "indiv" for each case
did_means(y+x1+period~treat|post, base_did, indiv = "id")
#>            vars    cond: 1     cond: 0 Difference t-stat     cond: 1
#> 1             y 0.47 (5.1)    0.32 (5)      0.142  0.326   6.2 (5.5)
#> 2            x1 0.17 (3.1) 0.046 (2.9)      0.125  0.487 0.095 (3.1)
#> 3        period    3 (1.4)     3 (1.4)          0      0     8 (1.4)
#> 4  Observations        275         265                           275
#> 5 # Individuals         55          53                            55
#>       cond: 0 Difference t-stat
#> 1       1 (5)       5.14   11.4
#> 2 -0.18 (2.8)      0.272   1.07
#> 3     8 (1.4)          0      0
#> 4         265                  
#> 5          53                  

# same but with the treat case "0" coming first
did_means(y+x1+period~treat|post, base_did, indiv = ~id, treat_first = 0)
#>            vars     cond: 0    cond: 1 Difference t-stat     cond: 0
#> 1             y    0.32 (5) 0.47 (5.1)     -0.142 -0.326       1 (5)
#> 2            x1 0.046 (2.9) 0.17 (3.1)     -0.125 -0.487 -0.18 (2.8)
#> 3        period     3 (1.4)    3 (1.4)          0      0     8 (1.4)
#> 4  Observations         265        275                           265
#> 5 # Individuals          53         55                            53
#>       cond: 1 Difference t-stat
#> 1   6.2 (5.5)      -5.14  -11.4
#> 2 0.095 (3.1)     -0.272  -1.07
#> 3     8 (1.4)          0      0
#> 4         275                  
#> 5          55                  

# Selecting all the variables with "."
did_means(.~treat|post, base_did, indiv = "id")
#>            vars    cond: 1     cond: 0 Difference t-stat     cond: 1
#> 1             y 0.47 (5.1)    0.32 (5)      0.142  0.326   6.2 (5.5)
#> 2            x1 0.17 (3.1) 0.046 (2.9)      0.125  0.487 0.095 (3.1)
#> 3        period    3 (1.4)     3 (1.4)          0      0     8 (1.4)
#> 4  Observations        275         265                           275
#> 5 # Individuals         55          53                            55
#>       cond: 0 Difference t-stat
#> 1       1 (5)       5.14   11.4
#> 2 -0.18 (2.8)      0.272   1.07
#> 3     8 (1.4)          0      0
#> 4         265                  
#> 5          53