Computes various fit statistics or hypothesis tests for fixest estimations.
These statistics can also be used within etable or set to be displayed when printing the
model with setFixest_print.
Usage
fitstat(
x,
type,
vcov = NULL,
cluster = NULL,
ssc = NULL,
simplify = FALSE,
verbose = TRUE,
show_types = FALSE,
htest = FALSE,
frame = parent.frame(),
...
)Arguments
- x
A
fixestestimation, obtained for example fromfeols.- type
Character vector or one sided formula. No default. The type of fit statistic or tests to be computed. The classic ones are:
n,rmse,r2,pr2,f,wald,ivf,ivwald. You have the full list in the details section or useshow_types = TRUE. Further, you can register your own types withfitstat_register.- vcov
Versatile argument to specify the VCOV. In general, it is either a character scalar equal to a VCOV type, either a formula of the form:
vcov_type ~ variables. The VCOV types implemented are: "iid", "hetero" (or "HC1"), "cluster", "twoway", "NW" (or "newey_west"), "DK" (or "driscoll_kraay"), and "conley". It also accepts object fromvcov_cluster,vcov_NW,NW,vcov_DK,DK,vcov_conleyandconley. It also accepts covariance matrices computed externally. Finally it accepts functions to compute the covariances. See thevcovdocumentation in the vignette.- cluster
Tells how to cluster the standard-errors (if clustering is requested). Can be either a list of vectors, a character vector of variable names, a formula or an integer vector. Assume we want to perform 2-way clustering over
var1andvar2contained in the data.framebaseused for the estimation. All the followingclusterarguments are valid and do the same thing:cluster = base[, c("var1", "var2")],cluster = c("var1", "var2"),cluster = ~var1+var2. If the two variables were used as fixed-effects in the estimation, you can leave it blank withvcov = "twoway"(assumingvar1[resp.var2] was the 1st [resp. 2nd] fixed-effect). You can interact two variables using^with the following syntax:cluster = ~var1^var2orcluster = "var1^var2".- ssc
An object of class
ssc_typeobtained with the functionssc. Represents how the small sample correction should be done. You must use the functionsscfor this argument. The arguments and defaults of the functionsscare:K.adj = TRUE,K.fixef = "nonnested",G.adj = TRUE,G.df = "min",t.df = "min",K.exact = FALSE). See the help of the functionsscfor details. Not all VCOV types are affected by this argument.- simplify
Logical, default is
FALSE. By default a list is returned whose names are the selected types. Ifsimplify = TRUEand only one type is selected, then the element is directly returned (i.e. will not be nested in a list).- verbose
Logical, default is
TRUE. IfTRUE, an object of classfixest_fitstatis returned (so its associated print method will be triggered). IfFALSEa simple list is returned instead (i.e. the same object without the class).- show_types
Logical, default is
FALSE. IfTRUE, this prompts all available types and nothing is returned.- htest
Logical scalar, default is
FALSE. IfTRUE, the results of tests are formatted according to thehtestclass (defined in thestatspackage), and the returned object is a simple list. IfFALSE, the results of each test is a simple list but the object returned is of classfixest_fitstat.- frame
An environment in which to evaluate variables, default is
parent.frame(). Only used if the argumenttypeis a formula and some values in the formula have to be extended with the dot square bracket operator. Mostly for internal use.- ...
For internal use.
Value
By default an object of class fixest_fitstat is returned.
This object is a simple list containing the statistics/tests requested
by the user.
For example fitstat(x, c("r2", "f")) returns a list with two elements named r2 and f.
Each statistic from the fixest_fitstat object can be of two types:
a list of numeric scalars whose elements depend on the type of fit statistic/test. For example the F-test, accessed with
f, contains the elementsstat,p,df1, anddf2. Thewaldtest contains the elementsstat,p,df1,df2andvcov. The likelihood ratio,lr, contains the elementsstat,p,df. Etc.a numeric scalar, for either: i) scalar fit statistics, like
r2orrmse, ii) when a specific element of a fit statistic is accessed, like e.g.f.statwhich reports thestatelement of the fit statisticf.
The types of fit statistics and their structure are detailed
in the section Available types of this documentation.
The class fixest_fitstat has only a dedicated print method.
If the argument htest=TRUE, the object returned is a plain list containing the requested
statistics and each test is formatted according to the htest class from the stats package.
htest objects are lists containing the following elements:
- statistic
A numeric scalar equal to test statistic.
- p.value
A numeric scalar equal to the p-value of the test.
- parameter
A list containing various parameters used to calculate the p-value and the statistic. It can be
df,df1,df2, orvcov.- alternative
The alternative hypothesis.
- method
The name of the test.
- data.name
The estimation call on which the test is applied.
Using verbose = FALSE removes the fixest_fitstat class from the returned object,
turning it into a plain list (this is ignored when htest=TRUE).
If only one type is selected, simplify = TRUE leads to the selected statistic to
be directly returned (and hence is not nested inside a list).
For example fitstat(x, "r2", simplify = TRUE) returns a simple scalar.
If show_types=TRUE, this function returns instead a character vector with
all the available types.
Details
Any statistic available in fitstat can also be used directly in etable via its
argument fitstat. For example etable(est, fitstat = c("r2", "rmse")) will report
the R2 and the RMSE in the fit statistics section of the table.
If one wants to change the default set of statistics reported when printing the model,
this is possible with the function setFixest_print which accepts the argument fitstat.
For example setFixest_print(fitstat = ~r2 + f) will report the R2 and the F-test
for each estimation.
Registering your own types
You can register custom fit statistics with the function fitstat_register.
These statistics can be anything. Please see its documentation.
Available types
The types are case sensitive, please use lower case only. The types available are:
n,ll,aic,bic,rmse:The number of observations, the log-likelihood, the AIC, the BIC and the root mean squared error, respectively.
my:Mean of the dependent variable.
g:The degrees of freedom used to compute the t-test (it influences the p-values of the coefficients). When the VCOV is clustered, this value is equal to the minimum cluster size, otherwise, it is equal to the sample size minus the number of variables.
r2,ar2,wr2,awr2,pr2,apr2,wpr2,awpr2:All r2 that can be obtained with the function
r2. Theastands for 'adjusted', thewfor 'within' and thepfor 'pseudo'. Note that the order of the lettersa,wandpdoes not matter. The pseudo R2s are McFadden's R2s (ratios of log-likelihoods).theta:The over-dispersion parameter in Negative Binomial models. Low values mean high overdispersion.
f,wf:The F-tests of nullity of the coefficients. The
wstands for 'within'. These types return the following values:stat,p,df1anddf2. If you want to display only one of these, use their name after a dot: e.g.f.statwill give the statistic of the F-test, orwf.pwill give the p-values of the F-test on the projected model (i.e. projected onto the fixed-effects).wald:Wald test of joint nullity of the coefficients. This test always excludes the intercept and the fixed-effects. This type returns the following values:
stat,p,df1,df2andvcov. The elementvcovreports the way the VCOV matrix was computed since it directly influences this statistic.ivf,ivf1,ivf2,ivfall:These statistics are specific to IV estimations. They report either the IV F-test (namely the Cragg-Donald F statistic in the presence of only one endogenous regressor) of the first stage (
ivforivf1), of the second stage (ivf2) or of both (ivfall). The F-test of the first stage is commonly named weak instrument test. The value ofivfallis only useful inetablewhen both the 1st and 2nd stages are displayed (it leads to the 1st stage F-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise,ivf1would also be displayed on the 2nd stage estimation). These types return the following values:stat,p,df1anddf2.ivwald,ivwald1,ivwald2,ivwaldall:These statistics are specific to IV estimations. They report either the IV Wald-test of the first stage (
ivwaldorivwald1), of the second stage (ivwald2) or of both (ivwaldall). The Wald-test of the first stage is commonly named weak instrument test. Note that if the estimation was done with a robust VCOV and there is only one endogenous regressor, this is equivalent to the Kleibergen-Paap statistic. The value ofivwaldallis only useful inetablewhen both the 1st and 2nd stages are displayed (it leads to the 1st stage Wald-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise,ivwald1would also be displayed on the 2nd stage estimation). These types return the following values:stat,p,df1,df2, andvcov.cd:The Cragg-Donald test for weak instruments.
kpr:The Kleibergen-Paap test for weak instruments.
wh:This statistic is specific to IV estimations. Wu-Hausman endogeneity test. H0 is the absence of endogeneity of the instrumented variables. It returns the following values:
stat,p,df1,df2.sargan:Sargan test of overidentifying restrictions. H0: the instruments are not correlated with the second stage residuals. It returns the following values:
stat,p,df.lr,wlr:Likelihood ratio and within likelihood ratio tests. It returns the following elements:
stat,p,df. Concerning the within-LR test, note that, contrary to estimations withfemlmorfeNmlm, estimations withfeglm/fepoisneed to estimate the model with fixed-effects only which may prove time-consuming (depending on your model). Bottom line, if you really need the within-LR and estimate a Poisson model, usefemlminstead offepois(the former uses direct ML maximization for which the only FEs model is a by product).
Examples
data(trade)
gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)
# Extracting the 'working' number of observations used to compute the pvalues
fitstat(gravity, "g", simplify = TRUE)
#> [1] 38295
# Some fit statistics
fitstat(gravity, ~ rmse + r2 + wald + wf)
#> RMSE: 2.26215
#> R2: 0.50428
#> Wald (joint nullity): stat = 5,832.8, p < 2.2e-16, on 1 and 38,295 DoF, VCOV: IID.
#> F-test (projected): stat = 5,832.8, p < 2.2e-16, on 1 and 38,295 DoF.
# This is a simple list:
names(fitstat(gravity, ~ rmse + r2 + wald + wf))
#> [1] "rmse" "r2" "wald" "wf"
# You can use them in etable
etable(gravity, fitstat = ~ rmse + r2 + wald + wf)
#> gravity
#> Dependent Var.: log(Euros)
#>
#> log(dist_km) -2.072*** (0.0271)
#> Fixed-Effects: ------------------
#> Destination Yes
#> Origin Yes
#> ____________________ __________________
#> S.E. type IID
#> RMSE 2.2622
#> R2 0.50428
#> Wald (joint nullity) 5,832.8
#> F-test (projected) 5,832.8
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# For wald and wf, you could show the pvalue instead:
etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)
#> gravity
#> Dependent Var.: log(Euros)
#>
#> log(dist_km) -2.072*** (0.0271)
#> Fixed-Effects: ------------------
#> Destination Yes
#> Origin Yes
#> _____________________________ __________________
#> S.E. type IID
#> RMSE 2.2622
#> R2 0.50428
#> Wald (joint nullity), p-value 0e-16
#> F-test (projected), p-value 0e-16
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# You can display the tests with the htest format with htest=TRUE
fitstat(gravity, ~ rmse + r2 + wald + wf, htest = TRUE)
#> $rmse
#> [1] 2.26215
#>
#> $r2
#> r2
#> 0.5042796
#>
#> $wald
#>
#> Wald (joint nullity)
#>
#> data: feols(fml = log(Euros) ~ log(dist_km) | Destination + Origin, data = trade)
#> statistic = 5832.8, df1 = 1, df2 = 38295, vcov = IID, p-value < 2.2e-16
#> alternative hypothesis: At least one non fixed-effect coefficient is different from 0
#>
#>
#> $wf
#>
#> F-test (projected)
#>
#> data: feols(fml = log(Euros) ~ log(dist_km) | Destination + Origin, data = trade)
#> statistic = 5832.8, df1 = 1, df2 = 38295, p-value < 2.2e-16
#> alternative hypothesis: At least one non fixed-effect coefficient is different from 0
#>
#>
# Now let's display some statistics that are not built-in
# => we use fitstat_register to create them
# We need: a) type name, b) the function to be applied
# c) (optional) an alias
fitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")
fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")
fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")
fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")
# Now we can use these keywords in fitstat:
etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)
#> gravity
#> Dependent Var.: log(Euros)
#>
#> log(dist_km) -2.072*** (0.0271)
#> Fixed-Effects: ------------------
#> Destination Yes
#> Origin Yes
#> __________________ __________________
#> S.E. type IID
#> Observations 38,325
#> R2 0.50428
#> Within R2 0.13218
#> t-stat (regular) -76.373
#> t-stat (HC1) -80.129
#> t-stat (clustered) -16.520
#> t-stat (2-way) -13.268
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1