Create, or interact variables with, factors

Treat a variable as a factor, or interacts a variable with a factor. Values to be dropped/kept from the factor can be easily set. Note that to interact fixed-effects, this function should not be used: instead use directly the syntax fe1^fe2.

Usage

i(factor_var, var, ref, keep, bin, ref2, keep2, bin2, ...)

Arguments

factor_var: A vector (of any type) that will be treated as a factor. You can set references (i.e. exclude values for which to create dummies) with the ref argument.
var: A variable of the same length as factor_var. This variable will be interacted with the factor in factor_var. It can be numeric or factor-like. To force a numeric variable to be treated as a factor, you can add the i. prefix to a variable name. For instance take a numeric variable x_num: i(x_fact, x_num) will treat x_num as numeric while i(x_fact, i.x_num) will treat x_num as a factor (it's a shortcut to as.factor(x_num)).
ref: A vector of values to be taken as references from factor_var. Can also be a logical: if TRUE, then the first value of factor_var will be removed. If ref is a character vector, partial matching is applied to values; use "@" as the first character to enable regular expression matching. See examples.
keep: A vector of values to be kept from factor_var (all others are dropped). By default they should be values from factor_var and if keep is a character vector partial matching is applied. Use "@" as the first character to enable regular expression matching instead.
bin: A list of values to be grouped, a vector, a formula, or the special values "bin::digit" or "cut::values". To create a new value from old values, use bin = list("new_value"=old_values) with old_values a vector of existing values. You can use .() for list(). It accepts regular expressions, but they must start with an "@", like in bin="@Aug|Dec". It accepts one-sided formulas which must contain the variable x, e.g. bin=list("<2" = ~x < 2). The names of the list are the new names. If the new name is missing, the first value matched becomes the new name. In the name, adding "@d", with d a digit, will relocate the value in position d: useful to change the position of factors. Use "@" as first item to make subsequent items be located first in the factor. Feeding in a vector is like using a list without name and only a single element. If the vector is numeric, you can use the special value "bin::digit" to group every digit element. For example if x represents years, using bin="bin::2" creates bins of two years. With any data, using "!bin::digit" groups every digit consecutive values starting from the first value. Using "!!bin::digit" is the same but starting from the last value. With numeric vectors you can: a) use "cut::n" to cut the vector into n equal parts, b) use "cut::a]b[" to create the following bins: [min, a], ]a, b[, [b, max]. The latter syntax is a sequence of number/quartile (q0 to q4)/percentile (p0 to p100) followed by an open or closed square bracket. You can add custom bin names by adding them in the character vector after 'cut::values'. See details and examples. Dot square bracket expansion (see dsb) is enabled.
ref2: A vector of values to be dropped from var. By default they should be values from var and if ref2 is a character vector partial matching is applied. Use "@" as the first character to enable regular expression matching instead.
keep2: A vector of values to be kept from var (all others are dropped). By default they should be values from var and if keep2 is a character vector partial matching is applied. Use "@" as the first character to enable regular expression matching instead.
bin2: A list or vector defining the binning of the second variable. See help for the argument bin for details (or look at the help of the function bin). You can use .() for list().
...: Not currently used.

Value

It returns a matrix with number of rows the length of factor_var. If there is no interacted variable or it is interacted with a numeric variable, the number of columns is equal to the number of cases contained in factor_var minus the reference(s). If the interacted variable is a factor, the number of columns is the number of combined cases between factor_var and var.

Details

To interact fixed-effects, this function should not be used: instead use directly the syntax fe1^fe2 in the fixed-effects part of the formula. Please see the details and examples in the help page of feols.

Author

Laurent Berge

Examples


#
# Simple illustration
#

x = rep(letters[1:4], 3)[1:10]
y = rep(1:4, c(1, 2, 3, 4))

# interaction
data.frame(x, y, i(x, y, ref = TRUE))
#>    x y b c d
#> 1  a 1 0 0 0
#> 2  b 2 2 0 0
#> 3  c 2 0 2 0
#> 4  d 3 0 0 3
#> 5  a 3 0 0 0
#> 6  b 3 3 0 0
#> 7  c 4 0 4 0
#> 8  d 4 0 0 4
#> 9  a 4 0 0 0
#> 10 b 4 4 0 0

# without interaction
data.frame(x, i(x, "b"))
#>    x a c d
#> 1  a 1 0 0
#> 2  b 0 0 0
#> 3  c 0 1 0
#> 4  d 0 0 1
#> 5  a 1 0 0
#> 6  b 0 0 0
#> 7  c 0 1 0
#> 8  d 0 0 1
#> 9  a 1 0 0
#> 10 b 0 0 0

# you can interact factors too
z = rep(c("e", "f", "g"), c(5, 3, 2))
data.frame(x, z, i(x, z))
#>    x z a.e a.g b.e b.f b.g c.e c.f d.e d.f
#> 1  a e   1   0   0   0   0   0   0   0   0
#> 2  b e   0   0   1   0   0   0   0   0   0
#> 3  c e   0   0   0   0   0   1   0   0   0
#> 4  d e   0   0   0   0   0   0   0   1   0
#> 5  a e   1   0   0   0   0   0   0   0   0
#> 6  b f   0   0   0   1   0   0   0   0   0
#> 7  c f   0   0   0   0   0   0   1   0   0
#> 8  d f   0   0   0   0   0   0   0   0   1
#> 9  a g   0   1   0   0   0   0   0   0   0
#> 10 b g   0   0   0   0   1   0   0   0   0

# to force a numeric variable to be treated as a factor: use i.
data.frame(x, y, i(x, i.y))
#>    x y a.1 a.3 a.4 b.2 b.3 b.4 c.2 c.4 d.3 d.4
#> 1  a 1   1   0   0   0   0   0   0   0   0   0
#> 2  b 2   0   0   0   1   0   0   0   0   0   0
#> 3  c 2   0   0   0   0   0   0   1   0   0   0
#> 4  d 3   0   0   0   0   0   0   0   0   1   0
#> 5  a 3   0   1   0   0   0   0   0   0   0   0
#> 6  b 3   0   0   0   0   1   0   0   0   0   0
#> 7  c 4   0   0   0   0   0   0   0   1   0   0
#> 8  d 4   0   0   0   0   0   0   0   0   0   1
#> 9  a 4   0   0   1   0   0   0   0   0   0   0
#> 10 b 4   0   0   0   0   0   1   0   0   0   0

# Binning
data.frame(x, i(x, bin = list(ab = c("a", "b"))))
#>    x ab c d
#> 1  a  1 0 0
#> 2  b  1 0 0
#> 3  c  0 1 0
#> 4  d  0 0 1
#> 5  a  1 0 0
#> 6  b  1 0 0
#> 7  c  0 1 0
#> 8  d  0 0 1
#> 9  a  1 0 0
#> 10 b  1 0 0

# Same as before but using .() for list() and a regular expression
# note that to trigger a regex, you need to use an @ first
data.frame(x, i(x, bin = .(ab = "@a|b")))
#>    x ab c d
#> 1  a  1 0 0
#> 2  b  1 0 0
#> 3  c  0 1 0
#> 4  d  0 0 1
#> 5  a  1 0 0
#> 6  b  1 0 0
#> 7  c  0 1 0
#> 8  d  0 0 1
#> 9  a  1 0 0
#> 10 b  1 0 0

#
# In fixest estimations
#

data(base_did)
# We interact the variable 'period' with the variable 'treat'
est_did = feols(y ~ x1 + i(period, treat, 5) | id + period, base_did)

# => plot only interactions with iplot
iplot(est_did)


# Using i() for factors
est_bis = feols(y ~ x1 + i(period, keep = 3:6) + i(period, treat, 5) | id, base_did)

# we plot the second set of variables created with i()
# => we need to use keep (otherwise only the first one is represented)
coefplot(est_bis, keep = "trea")


# => special treatment in etable
etable(est_bis, dict = c("6" = "six"))
#>                                 est_bis
#> Dependent Var.:                       y
#>                                        
#> x1                   0.9720*** (0.0448)
#> period = 3             -1.111. (0.6064)
#> period = 4              0.4034 (0.6066)
#> period = 5             -0.8980 (0.6066)
#> period = six            0.8031 (0.6064)
#> treat x period = 1     -2.252* (0.9875)
#> treat x period = 2      -1.523 (0.9875)
#> treat x period = 3      -0.2720 (1.113)
#> treat x period = 4       -1.794 (1.113)
#> treat x period = six     0.7850 (1.113)
#> treat x period = 7    3.650*** (0.9875)
#> treat x period = 8    4.310*** (0.9874)
#> treat x period = 9    5.636*** (0.9874)
#> treat x period = 10   6.276*** (0.9875)
#> Fixed-Effects:       ------------------
#> id                                  Yes
#> ____________________ __________________
#> S.E. type                           IID
#> Observations                      1,080
#> R2                              0.54466
#> Within R2                       0.45396
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#
# Interact two factors
#

# We use the i. prefix to consider week as a factor
data(airquality)
aq = airquality
aq$week = aq$Day %/% 7 + 1

# Interacting Month and week:
res_2F = feols(Ozone ~ Solar.R + i(Month, i.week), aq)
#> NOTE: 42 observations removed because of NA values (LHS: 37, RHS: 7).

# Same but dropping the 5th Month and 1st week
res_2F_bis = feols(Ozone ~ Solar.R + i(Month, i.week, ref = 5, ref2 = 1), aq)
#> NOTE: 42 observations removed because of NA values (LHS: 37, RHS: 7).

etable(res_2F, res_2F_bis)
#>                                 res_2F        res_2F_bis
#> Dependent Var.:                  Ozone             Ozone
#>                                                         
#> Constant                 8.207 (14.16)    18.51* (7.343)
#> Solar.R              0.0963** (0.0314) 0.1007** (0.0324)
#> Month = 5 x week = 2    -11.36 (17.18)                  
#> Month = 5 x week = 3    -9.660 (16.05)                  
#> Month = 5 x week = 4    -6.923 (18.28)                  
#> Month = 5 x week = 5     28.32 (18.10)                  
#> Month = 6 x week = 2     10.88 (18.13)   -0.3936 (14.93)
#> Month = 6 x week = 3    -2.422 (17.22)    -13.40 (13.47)
#> Month = 7 x week = 1    31.87. (17.27)                  
#> Month = 7 x week = 2    34.35* (16.59)    23.00. (12.58)
#> Month = 7 x week = 3     20.17 (16.54)     8.938 (12.47)
#> Month = 7 x week = 4    33.76. (17.26)    22.85. (13.51)
#> Month = 7 x week = 5    31.58. (18.19)     20.19 (15.04)
#> Month = 8 x week = 1     7.218 (19.98)                  
#> Month = 8 x week = 2   48.12** (17.22)   36.81** (13.56)
#> Month = 8 x week = 3     19.17 (16.62)     8.257 (12.48)
#> Month = 8 x week = 4    36.50* (17.18)    25.35. (13.46)
#> Month = 8 x week = 5  62.00*** (18.12)  50.76*** (14.91)
#> Month = 9 x week = 1   46.47** (16.57)                  
#> Month = 9 x week = 2    -5.661 (16.12)    -17.03 (11.82)
#> Month = 9 x week = 3    -2.978 (16.10)    -13.95 (11.65)
#> Month = 9 x week = 4     1.809 (16.73)    -8.973 (12.61)
#> Month = 9 x week = 5    -8.373 (19.56)    -19.47 (16.95)
#> ____________________ _________________ _________________
#> S.E. type                          IID               IID
#> Observations                       111               111
#> R2                             0.52636           0.37684
#> Adj. R2                        0.40795           0.27844
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#
# Binning
#

data(airquality)

feols(Ozone ~ i(Month, bin = "bin::2"), airquality)
#> NOTE: 37 observations removed because of NA values (LHS: 37).
#> OLS estimation, Dep. Var.: Ozone
#> Observations: 116
#> Standard-errors: IID 
#>             Estimate Std. Error t value   Pr(>|t|)    
#> (Intercept)  23.6154    6.19450 3.81231 0.00022469 ***
#> Month::6     27.8703    8.17782 3.40804 0.00090749 ***
#> Month::8     21.3119    7.51740 2.83501 0.00543040 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 31.2   Adj. R2: 0.083194

feols(Ozone ~ i(Month, bin = list(summer = 7:9)), airquality)
#> NOTE: 37 observations removed because of NA values (LHS: 37).
#> OLS estimation, Dep. Var.: Ozone
#> Observations: 116
#> Standard-errors: IID 
#>               Estimate Std. Error  t value   Pr(>|t|)    
#> (Intercept)   23.61538    6.13010 3.852364 0.00019455 ***
#> Month::6       5.82906   12.08872 0.482190 0.63060377    
#> Month::summer 25.86610    7.04559 3.671249 0.00037013 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 30.9   Adj. R2: 0.102158