Constructs a fixest
panel data base out of a data.frame which allows to use leads and lags
in fixest
estimations and to create new variables from leads and lags if the data.frame
was also a data.table::data.table
.
Usage
panel(data, panel.id, time.step = NULL, duplicate.method = c("none", "first"))
Arguments
- data
A data.frame.
- panel.id
The panel identifiers. Can either be: i) a one sided formula (e.g.
panel.id = ~id+time
), ii) a character vector of length 2 (e.g.panel.id=c('id', 'time')
, or iii) a character scalar of two variables separated by a comma (e.g.panel.id='id,time'
). Note that you can combine variables with^
only inside formulas (see the dedicated section infeols
).- time.step
The method to compute the lags, default is
NULL
(which means automatically set). Can be equal to:"unitary"
,"consecutive"
,"within.consecutive"
, or to a number. If"unitary"
, then the largest common divisor between consecutive time periods is used (typically if the time variable represents years, it will be 1). This method can apply only to integer (or convertible to integer) variables. If"consecutive"
, then the time variable can be of any type: two successive time periods represent a lag of 1. If"witihn.consecutive"
then within a given id, two successive time periods represent a lag of 1. Finally, if the time variable is numeric, you can provide your own numeric time step.- duplicate.method
If several observations have the same id and time values, then the notion of lag is not defined for them. If
duplicate.method = "none"
(default) and duplicate values are found, this leads to an error. You can useduplicate.method = "first"
so that the first occurrence of identical id/time observations will be used as lag.
Value
It returns a data base identical to the one given in input, but with an additional attribute: “panel_info”. This attribute contains vectors used to efficiently create lags/leads of the data. When the data is subselected, some bookeeping is performed on the attribute “panel_info”.
Details
This function allows you to use leads and lags in a fixest
estimation without having to
provide the argument panel.id
. It also offers more options on how to set the panel
(with the additional arguments 'time.step' and 'duplicate.method').
When the initial data set was also a data.table
, not all operations are supported and some may
dissolve the fixest_panel
. This is the case when creating subselections of the initial data
with additional attributes (e.g. pdt[x>0, .(x, y, z)]
would dissolve the fixest_panel
,
meaning only a data.table would be the result of the call).
If the initial data set was also a data.table
, then you can create new variables from lags
and leads using the functions l
and f
. See the example.
Examples
data(base_did)
# Setting a data set as a panel...
pdat = panel(base_did, ~id+period)
# ...then using the functions l and f
est1 = feols(y~l(x1, 0:1), pdat)
#> NOTE: 108 observations removed because of NA values (RHS: 108).
est2 = feols(f(y)~l(x1, -1:1), pdat)
#> NOTE: 216 observations removed because of NA values (LHS: 108, RHS: 216).
est3 = feols(l(y)~l(x1, 0:3), pdat)
#> NOTE: 324 observations removed because of NA values (LHS: 108, RHS: 324).
etable(est1, est2, est3, order = c("f", "^x"), drop="Int")
#> est1 est2 est3
#> Dependent Var.: y f(y,1) l(y,1)
#>
#> f(x1,1) 0.9940*** (0.0542)
#> x1 0.9948*** (0.0487) 0.0081 (0.0592) -0.0534 (0.0545)
#> Constant 2.235*** (0.2032) 2.464*** (0.2233) 2.196*** (0.2110)
#> l(x1,1) 0.0410 (0.0558) 0.0157 (0.0640) 0.9871*** (0.0551)
#> l(x1,2) 0.0220 (0.0580)
#> l(x1,3) 0.0102 (0.0639)
#> _______________ __________________ __________________ __________________
#> S.E.: Clustered by: id by: id by: id
#> Observations 972 864 756
#> R2 0.26558 0.25697 0.25875
#> Adj. R2 0.26406 0.25438 0.25480
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# or using the argument panel.id
feols(f(y)~l(x1, -1:1), base_did, panel.id = ~id+period)
#> NOTE: 216 observations removed because of NA values (LHS: 108, RHS: 216).
#> OLS estimation, Dep. Var.: f(y, 1)
#> Observations: 864
#> Standard-errors: Clustered (id)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.464313 0.223277 11.037009 < 2.2e-16 ***
#> f(x1, 1) 0.994018 0.054216 18.334504 < 2.2e-16 ***
#> x1 0.008072 0.059247 0.136241 0.89189
#> l(x1, 1) 0.015693 0.063958 0.245360 0.80665
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.97418 Adj. R2: 0.254377
# You can use panel.id in various ways:
pdat = panel(base_did, ~id+period)
# is identical to:
pdat = panel(base_did, c("id", "period"))
# and also to:
pdat = panel(base_did, "id,period")
# l() and f() can also be used within a data.table:
if(require("data.table")){
pdat_dt = panel(as.data.table(base_did), ~id+period)
# Now since pdat_dt is also a data.table
# you can create lags/leads directly
pdat_dt[, x1_l1 := l(x1)]
pdat_dt[, c("x1_l1_fill0", "y_f2") := .(l(x1, fill = 0), f(y, 2))]
}