Skip to contents

Computes various fit statistics or hypothesis tests for fixest estimations. These statistics can also be used within etable or set to be displayed when printing the model with setFixest_print.

Usage

fitstat(
  x,
  type,
  vcov = NULL,
  cluster = NULL,
  ssc = NULL,
  simplify = FALSE,
  verbose = TRUE,
  show_types = FALSE,
  htest = FALSE,
  frame = parent.frame(),
  ...
)

Arguments

x

A fixest estimation, obtained for example from feols.

type

Character vector or one sided formula. No default. The type of fit statistic or tests to be computed. The classic ones are: n, rmse, r2, pr2, f, wald, ivf, ivwald. You have the full list in the details section or use show_types = TRUE. Further, you can register your own types with fitstat_register.

vcov

Versatile argument to specify the VCOV. In general, it is either a character scalar equal to a VCOV type, either a formula of the form: vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also accepts object from vcov_cluster, vcov_NW, NW, vcov_DK, DK, vcov_conley and conley. It also accepts covariance matrices computed externally. Finally it accepts functions to compute the covariances. See the vcov documentation in the vignette.

cluster

Tells how to cluster the standard-errors (if clustering is requested). Can be either a list of vectors, a character vector of variable names, a formula or an integer vector. Assume we want to perform 2-way clustering over var1 and var2 contained in the data.frame base used for the estimation. All the following cluster arguments are valid and do the same thing: cluster = base[, c("var1", "var2")], cluster = c("var1", "var2"), cluster = ~var1+var2. If the two variables were used as fixed-effects in the estimation, you can leave it blank with vcov = "twoway" (assuming var1 [resp. var2] was the 1st [resp. 2nd] fixed-effect). You can interact two variables using ^ with the following syntax: cluster = ~var1^var2 or cluster = "var1^var2".

ssc

An object of class ssc_type obtained with the function ssc. Represents how the small sample correction should be done. You must use the function ssc for this argument. The arguments and defaults of the function ssc are: K.adj = TRUE, K.fixef = "nonnested", G.adj = TRUE, G.df = "min", t.df = "min", K.exact = FALSE). See the help of the function ssc for details. Not all VCOV types are affected by this argument.

simplify

Logical, default is FALSE. By default a list is returned whose names are the selected types. If simplify = TRUE and only one type is selected, then the element is directly returned (i.e. will not be nested in a list).

verbose

Logical, default is TRUE. If TRUE, an object of class fixest_fitstat is returned (so its associated print method will be triggered). If FALSE a simple list is returned instead (i.e. the same object without the class).

show_types

Logical, default is FALSE. If TRUE, this prompts all available types and nothing is returned.

htest

Logical scalar, default is FALSE. If TRUE, the results of tests are formatted according to the htest class (defined in the stats package), and the returned object is a simple list. If FALSE, the results of each test is a simple list but the object returned is of class fixest_fitstat.

frame

An environment in which to evaluate variables, default is parent.frame(). Only used if the argument type is a formula and some values in the formula have to be extended with the dot square bracket operator. Mostly for internal use.

...

For internal use.

Value

By default an object of class fixest_fitstat is returned. This object is a simple list containing the statistics/tests requested by the user. For example fitstat(x, c("r2", "f")) returns a list with two elements named r2 and f.

Each statistic from the fixest_fitstat object can be of two types:

  1. a list of numeric scalars whose elements depend on the type of fit statistic/test. For example the F-test, accessed with f, contains the elements stat, p, df1, and df2. The wald test contains the elements stat, p, df1, df2 and vcov. The likelihood ratio, lr, contains the elements stat, p, df. Etc.

  2. a numeric scalar, for either: i) scalar fit statistics, like r2 or rmse, ii) when a specific element of a fit statistic is accessed, like e.g. f.stat which reports the stat element of the fit statistic f.

The types of fit statistics and their structure are detailed in the section Available types of this documentation.

The class fixest_fitstat has only a dedicated print method.

If the argument htest=TRUE, the object returned is a plain list containing the requested statistics and each test is formatted according to the htest class from the stats package. htest objects are lists containing the following elements:

statistic

A numeric scalar equal to test statistic.

p.value

A numeric scalar equal to the p-value of the test.

parameter

A list containing various parameters used to calculate the p-value and the statistic. It can be df, df1, df2, or vcov.

alternative

The alternative hypothesis.

method

The name of the test.

data.name

The estimation call on which the test is applied.

Using verbose = FALSE removes the fixest_fitstat class from the returned object, turning it into a plain list (this is ignored when htest=TRUE).

If only one type is selected, simplify = TRUE leads to the selected statistic to be directly returned (and hence is not nested inside a list). For example fitstat(x, "r2", simplify = TRUE) returns a simple scalar.

If show_types=TRUE, this function returns instead a character vector with all the available types.

Details

Any statistic available in fitstat can also be used directly in etable via its argument fitstat. For example etable(est, fitstat = c("r2", "rmse")) will report the R2 and the RMSE in the fit statistics section of the table.

If one wants to change the default set of statistics reported when printing the model, this is possible with the function setFixest_print which accepts the argument fitstat. For example setFixest_print(fitstat = ~r2 + f) will report the R2 and the F-test for each estimation.

Registering your own types

You can register custom fit statistics with the function fitstat_register. These statistics can be anything. Please see its documentation.

Available types

The types are case sensitive, please use lower case only. The types available are:

n, ll, aic, bic, rmse:

The number of observations, the log-likelihood, the AIC, the BIC and the root mean squared error, respectively.

my:

Mean of the dependent variable.

g:

The degrees of freedom used to compute the t-test (it influences the p-values of the coefficients). When the VCOV is clustered, this value is equal to the minimum cluster size, otherwise, it is equal to the sample size minus the number of variables.

r2, ar2, wr2, awr2, pr2, apr2, wpr2, awpr2:

All r2 that can be obtained with the function r2. The a stands for 'adjusted', the w for 'within' and the p for 'pseudo'. Note that the order of the letters a, w and p does not matter. The pseudo R2s are McFadden's R2s (ratios of log-likelihoods).

theta:

The over-dispersion parameter in Negative Binomial models. Low values mean high overdispersion.

f, wf:

The F-tests of nullity of the coefficients. The w stands for 'within'. These types return the following values: stat, p, df1 and df2. If you want to display only one of these, use their name after a dot: e.g. f.stat will give the statistic of the F-test, or wf.p will give the p-values of the F-test on the projected model (i.e. projected onto the fixed-effects).

wald:

Wald test of joint nullity of the coefficients. This test always excludes the intercept and the fixed-effects. This type returns the following values: stat, p, df1, df2 and vcov. The element vcov reports the way the VCOV matrix was computed since it directly influences this statistic.

ivf, ivf1, ivf2, ivfall:

These statistics are specific to IV estimations. They report either the IV F-test (namely the Cragg-Donald F statistic in the presence of only one endogenous regressor) of the first stage (ivf or ivf1), of the second stage (ivf2) or of both (ivfall). The F-test of the first stage is commonly named weak instrument test. The value of ivfall is only useful in etable when both the 1st and 2nd stages are displayed (it leads to the 1st stage F-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise, ivf1 would also be displayed on the 2nd stage estimation). These types return the following values: stat, p, df1 and df2.

ivwald, ivwald1, ivwald2, ivwaldall:

These statistics are specific to IV estimations. They report either the IV Wald-test of the first stage (ivwald or ivwald1), of the second stage (ivwald2) or of both (ivwaldall). The Wald-test of the first stage is commonly named weak instrument test. Note that if the estimation was done with a robust VCOV and there is only one endogenous regressor, this is equivalent to the Kleibergen-Paap statistic. The value of ivwaldall is only useful in etable when both the 1st and 2nd stages are displayed (it leads to the 1st stage Wald-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise, ivwald1 would also be displayed on the 2nd stage estimation). These types return the following values: stat, p, df1, df2, and vcov.

cd:

The Cragg-Donald test for weak instruments.

kpr:

The Kleibergen-Paap test for weak instruments.

wh:

This statistic is specific to IV estimations. Wu-Hausman endogeneity test. H0 is the absence of endogeneity of the instrumented variables. It returns the following values: stat, p, df1, df2.

sargan:

Sargan test of overidentifying restrictions. H0: the instruments are not correlated with the second stage residuals. It returns the following values: stat, p, df.

lr, wlr:

Likelihood ratio and within likelihood ratio tests. It returns the following elements: stat, p, df. Concerning the within-LR test, note that, contrary to estimations with femlm or feNmlm, estimations with feglm/fepois need to estimate the model with fixed-effects only which may prove time-consuming (depending on your model). Bottom line, if you really need the within-LR and estimate a Poisson model, use femlm instead of fepois (the former uses direct ML maximization for which the only FEs model is a by product).

Examples


data(trade)
gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)

# Extracting the 'working' number of observations used to compute the pvalues
fitstat(gravity, "g", simplify = TRUE)
#> [1] 38295

# Some fit statistics
fitstat(gravity, ~ rmse + r2 + wald + wf)
#>                 RMSE: 2.26215
#>                   R2: 0.50428
#> Wald (joint nullity): stat = 5,832.8, p < 2.2e-16, on 1 and 38,295 DoF, VCOV: IID.
#>   F-test (projected): stat = 5,832.8, p < 2.2e-16, on 1 and 38,295 DoF.

# This is a simple list:
names(fitstat(gravity, ~ rmse + r2 + wald + wf))
#> [1] "rmse" "r2"   "wald" "wf"  

# You can use them in etable
etable(gravity, fitstat = ~ rmse + r2 + wald + wf)
#>                                 gravity
#> Dependent Var.:              log(Euros)
#>                                        
#> log(dist_km)         -2.072*** (0.0271)
#> Fixed-Effects:       ------------------
#> Destination                         Yes
#> Origin                              Yes
#> ____________________ __________________
#> S.E. type                           IID
#> RMSE                             2.2622
#> R2                              0.50428
#> Wald (joint nullity)            5,832.8
#> F-test (projected)              5,832.8
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# For wald and wf, you could show the pvalue instead:
etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)
#>                                          gravity
#> Dependent Var.:                       log(Euros)
#>                                                 
#> log(dist_km)                  -2.072*** (0.0271)
#> Fixed-Effects:                ------------------
#> Destination                                  Yes
#> Origin                                       Yes
#> _____________________________ __________________
#> S.E. type                                    IID
#> RMSE                                      2.2622
#> R2                                       0.50428
#> Wald (joint nullity), p-value              0e-16
#> F-test (projected), p-value                0e-16
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# You can display the tests with the htest format with htest=TRUE
fitstat(gravity, ~ rmse + r2 + wald + wf, htest = TRUE)
#> $rmse
#> [1] 2.26215
#> 
#> $r2
#>        r2 
#> 0.5042796 
#> 
#> $wald
#> 
#> 	Wald (joint nullity)
#> 
#> data:  feols(fml = log(Euros) ~ log(dist_km) | Destination + Origin, data = trade)
#> statistic = 5832.8, df1 = 1, df2 = 38295, vcov = IID, p-value < 2.2e-16
#> alternative hypothesis: At least one non fixed-effect coefficient is different from 0
#> 
#> 
#> $wf
#> 
#> 	F-test (projected)
#> 
#> data:  feols(fml = log(Euros) ~ log(dist_km) | Destination + Origin, data = trade)
#> statistic = 5832.8, df1 = 1, df2 = 38295, p-value < 2.2e-16
#> alternative hypothesis: At least one non fixed-effect coefficient is different from 0
#> 
#> 

# Now let's display some statistics that are not built-in
# => we use fitstat_register to create them

# We need: a) type name, b) the function to be applied
#          c) (optional) an alias

fitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")
fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")
fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")
fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")

# Now we can use these keywords in fitstat:
etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)
#>                               gravity
#> Dependent Var.:            log(Euros)
#>                                      
#> log(dist_km)       -2.072*** (0.0271)
#> Fixed-Effects:     ------------------
#> Destination                       Yes
#> Origin                            Yes
#> __________________ __________________
#> S.E. type                         IID
#> Observations                   38,325
#> R2                            0.50428
#> Within R2                     0.13218
#> t-stat (regular)              -76.373
#> t-stat (HC1)                  -80.129
#> t-stat (clustered)            -16.520
#> t-stat (2-way)                -13.268
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1