Computes fit statistics of fixest objects

Computes various fit statistics for fixest estimations.

Usage

fitstat(
  x,
  type,
  vcov = NULL,
  cluster = NULL,
  ssc = NULL,
  simplify = FALSE,
  verbose = TRUE,
  show_types = FALSE,
  frame = parent.frame(),
  ...
)

Arguments

x: A fixest estimation.
type: Character vector or one sided formula. The type of fit statistic to be computed. The classic ones are: n, rmse, r2, pr2, f, wald, ivf, ivwald. You have the full list in the details section or use show_types = TRUE. Further, you can register your own types with fitstat_register.
vcov: Versatile argument to specify the VCOV. In general, it is either a character scalar equal to a VCOV type, either a formula of the form: vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley. It also accepts covariance matrices computed externally. Finally it accepts functions to compute the covariances. See the vcov documentation in the vignette.
cluster: Tells how to cluster the standard-errors (if clustering is requested). Can be either a list of vectors, a character vector of variable names, a formula or an integer vector. Assume we want to perform 2-way clustering over var1 and var2 contained in the data.frame base used for the estimation. All the following cluster arguments are valid and do the same thing: cluster = base[, c("var1", "var2")], cluster = c("var1", "var2"), cluster = ~var1+var2. If the two variables were used as fixed-effects in the estimation, you can leave it blank with vcov = "twoway" (assuming var1 [resp. var2] was the 1st [resp. 2nd] fixed-effect). You can interact two variables using ^ with the following syntax: cluster = ~var1^var2 or cluster = "var1^var2".
ssc: An object of class ssc.type obtained with the function ssc. Represents how the degree of freedom correction should be done.You must use the function ssc for this argument. The arguments and defaults of the function ssc are: K.adj = TRUE, K.fixef = "nonnested", G.adj = TRUE, G.df = "min", t.df = "min", K.exact = FALSE). See the help of the function ssc for details.
simplify: Logical, default is FALSE. By default a list is returned whose names are the selected types. If simplify = TRUE and only one type is selected, then the element is directly returned (ie will not be nested in a list).
verbose: Logical, default is TRUE. If TRUE, an object of class fixest_fitstat is returned (so its associated print method will be triggered). If FALSE a simple list is returned instead.
show_types: Logical, default is FALSE. If TRUE, only prompts all available types.
frame: An environment in which to evaluate variables, default is parent.frame(). Only used if the argument type is a formula and some values in the formula have to be extended with the dot square bracket operator. Mostly for internal use.
...: For internal use.

Value

By default an object of class fixest_fitstat is returned. Using verbose = FALSE returns a simple a list. Finally, if only one type is selected, simplify = TRUE leads to the selected type to be returned.

Registering your own types

You can register custom fit statistics with the function fitstat_register.

Available types

The types are case sensitive, please use lower case only. The types available are:

n, ll, aic, bic, rmse:: The number of observations, the log-likelihood, the AIC, the BIC and the root mean squared error, respectively.
my:: Mean of the dependent variable.
g:: The degrees of freedom used to compute the t-test (it influences the p-values of the coefficients). When the VCOV is clustered, this value is equal to the minimum cluster size, otherwise, it is equal to the sample size minus the number of variables.
r2, ar2, wr2, awr2, pr2, apr2, wpr2, awpr2:: All r2 that can be obtained with the function r2. The a stands for 'adjusted', the w for 'within' and the p for 'pseudo'. Note that the order of the letters a, w and p does not matter. The pseudo R2s are McFadden's R2s (ratios of log-likelihoods).
theta:: The over-dispersion parameter in Negative Binomial models. Low values mean high overdispersion.
f, wf:: The F-tests of nullity of the coefficients. The w stands for 'within'. These types return the following values: stat, p, df1 and df2. If you want to display only one of these, use their name after a dot: e.g. f.stat will give the statistic of the F-test, or wf.p will give the p-values of the F-test on the projected model (i.e. projected onto the fixed-effects).
wald:: Wald test of joint nullity of the coefficients. This test always excludes the intercept and the fixed-effects. These type returns the following values: stat, p, df1, df2 and vcov. The element vcov reports the way the VCOV matrix was computed since it directly influences this statistic.
ivf, ivf1, ivf2, ivfall:: These statistics are specific to IV estimations. They report either the IV F-test (namely the Cragg-Donald F statistic in the presence of only one endogenous regressor) of the first stage (ivf or ivf1), of the second stage (ivf2) or of both (ivfall). The F-test of the first stage is commonly named weak instrument test. The value of ivfall is only useful in etable when both the 1st and 2nd stages are displayed (it leads to the 1st stage F-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise, ivf1 would also be displayed on the 2nd stage estimation). These types return the following values: stat, p, df1 and df2.
ivwald, ivwald1, ivwald2, ivwaldall:: These statistics are specific to IV estimations. They report either the IV Wald-test of the first stage (ivwald or ivwald1), of the second stage (ivwald2) or of both (ivwaldall). The Wald-test of the first stage is commonly named weak instrument test. Note that if the estimation was done with a robust VCOV and there is only one endogenous regressor, this is equivalent to the Kleibergen-Paap statistic. The value of ivwaldall is only useful in etable when both the 1st and 2nd stages are displayed (it leads to the 1st stage Wald-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise, ivwald1 would also be displayed on the 2nd stage estimation). These types return the following values: stat, p, df1, df2, and vcov.
cd:: The Cragg-Donald test for weak instruments.
kpr:: The Kleibergen-Paap test for weak instruments.
wh:: This statistic is specific to IV estimations. Wu-Hausman endogeneity test. H0 is the absence of endogeneity of the instrumented variables. It returns the following values: stat, p, df1, df2.
sargan:: Sargan test of overidentifying restrictions. H0: the instruments are not correlated with the second stage residuals. It returns the following values: stat, p, df.
lr, wlr:: Likelihood ratio and within likelihood ratio tests. It returns the following elements: stat, p, df. Concerning the within-LR test, note that, contrary to estimations with femlm or feNmlm, estimations with feglm/fepois need to estimate the model with fixed-effects only which may prove time-consuming (depending on your model). Bottom line, if you really need the within-LR and estimate a Poisson model, use femlm instead of fepois (the former uses direct ML maximization for which the only FEs model is a by product).

Examples


data(trade)
gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)

# Extracting the 'working' number of observations used to compute the pvalues
fitstat(gravity, "g", simplify = TRUE)
#> [1] 38295

# Some fit statistics
fitstat(gravity, ~ rmse + r2 + wald + wf)
#>                 RMSE: 2.26215
#>                   R2: 0.50428
#> Wald (joint nullity): stat = 5,832.8, p < 2.2e-16, on 1 and 38,295 DoF, VCOV: IID.
#>   F-test (projected): stat = 5,832.8, p < 2.2e-16, on 1 and 38,295 DoF.

# You can use them in etable
etable(gravity, fitstat = ~ rmse + r2 + wald + wf)
#>                                 gravity
#> Dependent Var.:              log(Euros)
#>                                        
#> log(dist_km)         -2.072*** (0.0271)
#> Fixed-Effects:       ------------------
#> Destination                         Yes
#> Origin                              Yes
#> ____________________ __________________
#> S.E. type                           IID
#> RMSE                             2.2622
#> R2                              0.50428
#> Wald (joint nullity)            5,832.8
#> F-test (projected)              5,832.8
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# For wald and wf, you could show the pvalue instead:
etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)
#>                                          gravity
#> Dependent Var.:                       log(Euros)
#>                                                 
#> log(dist_km)                  -2.072*** (0.0271)
#> Fixed-Effects:                ------------------
#> Destination                                  Yes
#> Origin                                       Yes
#> _____________________________ __________________
#> S.E. type                                    IID
#> RMSE                                      2.2622
#> R2                                       0.50428
#> Wald (joint nullity), p-value              0e-16
#> F-test (projected), p-value                0e-16
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Now let's display some statistics that are not built-in
# => we use fitstat_register to create them

# We need: a) type name, b) the function to be applied
#          c) (optional) an alias

fitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")
fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")
fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")
fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")

# Now we can use these keywords in fitstat:
etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)
#>                               gravity
#> Dependent Var.:            log(Euros)
#>                                      
#> log(dist_km)       -2.072*** (0.0271)
#> Fixed-Effects:     ------------------
#> Destination                       Yes
#> Origin                            Yes
#> __________________ __________________
#> S.E. type                         IID
#> Observations                   38,325
#> R2                            0.50428
#> Within R2                     0.13218
#> t-stat (regular)              -76.373
#> t-stat (HC1)                  -80.129
#> t-stat (clustered)            -16.520
#> t-stat (2-way)                -13.268
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Note that the custom stats we created are can easily lead
# to errors, but that's another story!