Skip to contents

This function creates the left-hand-side or the right-hand-side(s) of a femlm, feols or feglm estimation.

Usage

# S3 method for class 'fixest'
model.matrix(
  object,
  data = NULL,
  type = "rhs",
  sample = "estimation",
  na.rm = FALSE,
  subset = FALSE,
  as.matrix = FALSE,
  as.df = FALSE,
  collin.rm = TRUE,
  ...
)

Arguments

object

A fixest object. Obtained using the functions femlm, feols or feglm.

data

A data.frame or NULL (the default). If missing or NULL, then the original data is obtained by evaluating the call.

type

Character vector or one sided formula, default is "rhs". Contains the type of matrix/data.frame to be returned. Possible values are: "lhs", "rhs", "fixef", "iv.rhs1" (1st stage RHS), "iv.rhs2" (2nd stage RHS), "iv.endo" (endogenous vars.), "iv.exo" (exogenous vars), "iv.inst" (instruments).

sample

Character scalar equal to "estimation" (default) or "original". Only used when data=NULL (i.e. the original data is requested). By default, only the observations effectively used in the estimation are returned (it includes the observations with NA values or the fully explained by the fixed-effects (FE), or due to NAs in the weights).

If sample="original", all the observations are returned. In that case, if you use na.rm=TRUE (which is not the default), you can withdraw the observations with NA values (and keep the ones fully explained by the FEs).

na.rm

Logical scalar, default is FALSE. Should observations with NAs be removed from the resulting matrix or data.frame? Note that if data=NULL

subset

Logical scalar or character vector. Default is FALSE. If TRUE, then the matrix created will be restricted only to the variables contained in the argument data, which can then contain a subset of the variables used in the estimation. If a character vector, then only the variables matching the elements of the vector via regular expressions will be created.

as.matrix

Logical scalar, default is FALSE. Whether to coerce the result to a matrix.

as.df

Logical scalar, default is FALSE. Whether to coerce the result to a data.frame.

collin.rm

Logical scalar, default is TRUE. Only used when data=NULL (i.e. the data used in the estimation is requested). Whether to remove variables that were found to be collinear during the estimation. Beware: it does not perform a collinearity check.

...

Not currently used.

Value

It returns either a vector, a matrix or a data.frame. It returns a vector for the dependent variable ("lhs"), a data.frame for the fixed-effects ("fixef") and a matrix for any other type.

See also

See also the main estimation functions femlm, feols or feglm. formula.fixest, update.fixest, summary.fixest, vcov.fixest.

Author

Laurent Berge

Examples


# we use a data set with NAs and fixed-effect singletons
base = setNames(iris, c("y", "x1", "x2", "x3", "fe"))
# adding NAs
base$x1[1:4] = NA
# adding singletons
base$fe = as.character(base$fe)
base$fe[10 + 1:5] = letters[1:5]

# OLS estimation where we remove singletons
est = feols(y ~ x1 + poly(x2, 2) | fe, base, fixef.rm = "singleton")
#> NOTES: 4 observations removed because of NA values (RHS: 4).
#>        5 fixed-effect singletons were removed (5 observations).

# by default, we have the data set used in the estimation
head(model.matrix(est))
#>       x1 poly(x2, 2)1 poly(x2, 2)2
#> [1,] 3.6  -0.10942904   0.04837527
#> [2,] 3.9  -0.09550677   0.00559656
#> [3,] 3.4  -0.10942904   0.04837527
#> [4,] 3.4  -0.10478828   0.03339136
#> [5,] 2.9  -0.10942904   0.04837527
#> [6,] 3.1  -0.10478828   0.03339136
nrow(model.matrix(est))
#> [1] 141

# to have the original data set: we need to use sample="original"
head(model.matrix(est, sample = "original"))
#>       x1 poly(x2, 2)1 poly(x2, 2)2
#> [1,]  NA  -0.10942904   0.04837527
#> [2,]  NA  -0.10942904   0.04837527
#> [3,]  NA  -0.11406979   0.06408354
#> [4,]  NA  -0.10478828   0.03339136
#> [5,] 3.6  -0.10942904   0.04837527
#> [6,] 3.9  -0.09550677   0.00559656
nrow(model.matrix(est, sample = "original"))
#> [1] 150

# we can drop only the NA values (and not the singletons) with na.rm=TRUE
head(model.matrix(est, sample = "original", na.rm = TRUE))
#>       x1 poly(x2, 2)1 poly(x2, 2)2
#> [1,] 3.6  -0.10942904   0.04837527
#> [2,] 3.9  -0.09550677   0.00559656
#> [3,] 3.4  -0.10942904   0.04837527
#> [4,] 3.4  -0.10478828   0.03339136
#> [5,] 2.9  -0.10942904   0.04837527
#> [6,] 3.1  -0.10478828   0.03339136
nrow(model.matrix(est, sample = "original", na.rm = TRUE))
#> [1] 146

#
# Illustration of subset
#

# subset => character vector
head(model.matrix(est, subset = "x1"))
#>       x1
#> [1,] 3.6
#> [2,] 3.9
#> [3,] 3.4
#> [4,] 3.4
#> [5,] 2.9
#> [6,] 3.1

# subset => TRUE, only works with data argument!!
head(model.matrix(est, data = base[, "x1", drop = FALSE], subset = TRUE))
#>       x1
#> [1,]  NA
#> [2,]  NA
#> [3,]  NA
#> [4,]  NA
#> [5,] 3.6
#> [6,] 3.9