Constructs a fixest panel data base

Constructs a fixest panel data base out of a data.frame which allows to use leads and lags in fixest estimations and to create new variables from leads and lags if the data.frame was also a data.table::data.table.

Usage

panel(data, panel.id, time.step = NULL, duplicate.method = "none")

Arguments

data: A data.frame.
panel.id: The panel identifiers. Can either be: i) a one sided formula (e.g. panel.id = ~id+time), ii) a character vector of length 2 (e.g. panel.id=c('id', 'time'), or iii) a character scalar of two variables separated by a comma (e.g. panel.id='id,time'). Note that you can combine variables with ^ only inside formulas (see the dedicated section in feols).
time.step: The method to compute the lags, default is NULL (which means automatically set). Can be equal to: "unitary", "consecutive", "within.consecutive", or to a number. If "unitary", then the largest common divisor between consecutive time periods is used (typically if the time variable represents years, it will be 1). This method can apply only to integer (or convertible to integer) variables. If "consecutive", then the time variable can be of any type: two successive time periods represent a lag of 1. If "witihn.consecutive" then within a given id, two successive time periods represent a lag of 1. Finally, if the time variable is numeric, you can provide your own numeric time step.
duplicate.method: If several observations have the same id and time values, then the notion of lag is not defined for them. If duplicate.method = "none" (default) and duplicate values are found, this leads to an error. You can use duplicate.method = "first" so that the first occurrence of identical id/time observations will be used as lag.

Value

It returns a data base identical to the one given in input, but with an additional attribute: “panel_info”. This attribute contains vectors used to efficiently create lags/leads of the data. When the data is subselected, some bookeeping is performed on the attribute “panel_info”.

Details

This function allows you to use leads and lags in a fixest estimation without having to provide the argument panel.id. It also offers more options on how to set the panel (with the additional arguments 'time.step' and 'duplicate.method').

When the initial data set was also a data.table, not all operations are supported and some may dissolve the fixest_panel. This is the case when creating subselections of the initial data with additional attributes (e.g. pdt[x>0, .(x, y, z)] would dissolve the fixest_panel, meaning only a data.table would be the result of the call).

If the initial data set was also a data.table, then you can create new variables from lags and leads using the functions l and f. See the example.

Author

Laurent Berge

Examples


data(base_did)

# Setting a data set as a panel...
pdat = panel(base_did, ~id+period)

# ...then using the functions l and f
est1 = feols(y~l(x1, 0:1), pdat)
#> NOTE: 108 observations removed because of NA values (RHS: 108).
est2 = feols(f(y)~l(x1, -1:1), pdat)
#> NOTE: 216 observations removed because of NA values (LHS: 108, RHS: 216).
est3 = feols(l(y)~l(x1, 0:3), pdat)
#> NOTE: 324 observations removed because of NA values (LHS: 108, RHS: 324).
etable(est1, est2, est3, order = c("f", "^x"), drop="Int")
#>                               est1               est2               est3
#> Dependent Var.:                  y             f(y,1)             l(y,1)
#>                                                                         
#> f(x1,1)                            0.9940*** (0.0579)                   
#> x1              0.9948*** (0.0532)    0.0081 (0.0584)   -0.0534 (0.0599)
#> Constant         2.235*** (0.1577)  2.464*** (0.1697)  2.196*** (0.1750)
#> l(x1,1)            0.0410 (0.0540)    0.0157 (0.0585) 0.9871*** (0.0613)
#> l(x1,2)                                                  0.0220 (0.0607)
#> l(x1,3)                                                  0.0102 (0.0598)
#> _______________ __________________ __________________ __________________
#> S.E. type                      IID                IID                IID
#> Observations                   972                864                756
#> R2                         0.26558            0.25697            0.25875
#> Adj. R2                    0.26406            0.25438            0.25480
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# or using the argument panel.id
feols(f(y)~l(x1, -1:1), base_did, panel.id = ~id+period)
#> NOTE: 216 observations removed because of NA values (LHS: 108, RHS: 216).
#> OLS estimation, Dep. Var.: f(y, 1)
#> Observations: 864
#> Standard-errors: IID 
#>             Estimate Std. Error   t value  Pr(>|t|)    
#> (Intercept) 2.464313   0.169710 14.520756 < 2.2e-16 ***
#> f(x1, 1)    0.994018   0.057861 17.179278 < 2.2e-16 ***
#> x1          0.008072   0.058400  0.138217   0.89010    
#> l(x1, 1)    0.015693   0.058540  0.268068   0.78871    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.97418   Adj. R2: 0.254377

# You can use panel.id in various ways:
pdat = panel(base_did, ~id+period)
# is identical to:
pdat = panel(base_did, c("id", "period"))
# and also to:
pdat = panel(base_did, "id,period")

# l() and f() can also be used within a data.table:
if(require("data.table")){
  pdat_dt = panel(as.data.table(base_did), ~id+period)
  # Now since pdat_dt is also a data.table
  #   you can create lags/leads directly
  pdat_dt[, x1_l1 := l(x1)]
  pdat_dt[, c("x1_l1_fill0", "y_f2") := .(l(x1, fill = 0), f(y, 2))]
}

Constructs a `fixest` panel data base