Takes a variables of any types, transforms it into a factors, and modifies the values of the factors. Useful in estimations when you want to set some value of a vector as a reference.

ref(x, ref)

Arguments

x

A vector of any type (must be atomic though).

ref

A vector or a list, or special binning values (explained later). If a vector, it must correspond to (partially matched) values of the vector x. The vector x which will be transformed into a factor and these values will be placed first in the levels. That's the main usage of this function. You can also bin on-the-fly the values of x, using the same syntax as the function bin. Here's a description of what bin does: To create a new value from old values, use bin = list("new_value"=old_values) with old_values a vector of existing values. You can use .() for list(). It accepts regular expressions, but they must start with an "@", like in bin="@Aug|Dec". It accepts one-sided formulas which must contain the variable x, e.g. bin=list("<2" = ~x < 2). The names of the list are the new names. If the new name is missing, the first value matched becomes the new name. In the name, adding "@d", with d a digit, will relocate the value in position d: useful to change the position of factors. If the vector x is numeric, you can use the special value "bin::digit" to group every digit element. For example if x represents years, using bin="bin::2" creates bins of two years. With any data, using "!bin::digit" groups every digit consecutive values starting from the first value. Using "!!bin::digit" is the same but starting from the last value. With numeric vectors you can: a) use "cut::n" to cut the vector into n equal parts, b) use "cut::a]b[" to create the following bins: [min, a], ]a, b[, [b, max]. The latter syntax is a sequence of number/quartile (q0 to q4)/percentile (p0 to p100) followed by an open or closed square bracket. You can add custom bin names by adding them in the character vector after 'cut::values'. See details and examples. Dot square bracket expansion (see dsb) is enabled.

Value

It returns a factor of the same length as x, where levels have been modified according to the argument ref.

bin vs ref

The functions bin and ref are able to do the same thing, then why use one instead of the other? Here are the differences:

  • ref always returns a factor. This is in contrast with bin which returns, when possible, a vector of the same type as the vector in input.

  • ref always places the values modified in the first place of the factor levels. On the other hand, bin tries to not modify the ordering of the levels. It is possible to make bin mimic the behavior of ref by adding an "@" as the first element of the list in the argument bin.

  • when a vector (and not a list) is given in input, ref will place each element of the vector in the first place of the factor levels. The behavior of bin is totally different, bin will transform all the values in the vector into a single value in x (i.e. it's binning).

See also

To bin the values of a vectors: bin.

Author

Laurent Berge

Examples


data(airquality)

# A vector of months
month_num = airquality$Month
month_lab = c("may", "june", "july", "august", "september")
month_fact = factor(month_num, labels = month_lab)
table(month_num)
#> month_num
#>  5  6  7  8  9 
#> 31 30 31 31 30 
table(month_fact)
#> month_fact
#>       may      june      july    august september 
#>        31        30        31        31        30 

#
# Main use
#

# Without argument: equivalent to as.factor
ref(month_num)
#>   [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6
#>  [38] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7
#>  [75] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
#> [112] 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
#> [149] 9 9 9 9 9
#> Levels: 5 6 7 8 9

# Main usage: to set a level first:
# (Note that partial matching is enabled.)
table(ref(month_fact, "aug"))
#> 
#>    august       may      june      july september 
#>        31        31        30        31        30 

# You can rename the level on-the-fly
# (Northern hemisphere specific!)
table(ref(month_fact, .("Hot month"="aug",
                        "Late summer" = "sept")))
#> 
#>   Hot month Late summer         may        june        july 
#>          31          30          31          30          31 


# Main use is in estimations:
a = feols(Petal.Width ~ Petal.Length + Species, iris)

# We change the reference
b = feols(Petal.Width ~ Petal.Length + ref(Species, "vers"), iris)

etable(a, b)
#>                                               a                   b
#> Dependent Var.:                     Petal.Width         Petal.Width
#>                                                                    
#> (Intercept)                    -0.0908 (0.0564)    0.3445* (0.1489)
#> Petal.Length                 0.2304*** (0.0344)  0.2304*** (0.0344)
#> Speciesversicolor            0.4354*** (0.1028)                    
#> Speciesvirginica             0.8377*** (0.1453)                    
#> ref(Species,"vers")setosa                       -0.4354*** (0.1028)
#> ref(Species,"vers")virginica                     0.4023*** (0.0572)
#> ____________________________ __________________ ___________________
#> S.E. type                                   IID                 IID
#> Observations                                150                 150
#> R2                                      0.94557             0.94557
#> Adj. R2                                 0.94446             0.94446
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


#
# Binning
#

# You can also bin factor values on the fly
# Using @ first means a regular expression will be used to match the values.
# Note that the value created is placed first.
# To avoid that behavior => use the function "bin"
table(ref(month_fact, .(summer = "@jul|aug|sep")))
#> 
#> summer    may   june 
#>     92     31     30 

# Please refer to the example in the bin help page for more example.
# The syntax is the same.


#
# Precise relocation
#

# You can place a factor at the location you want
#  by adding "@digit" in the name first:
table(ref(month_num, .("@5"=5)))
#> 
#>  6  7  8  9  5 
#> 30 31 31 30 31 

# Same with renaming
table(ref(month_num, .("@5 five"=5)))
#> 
#>    6    7    8    9 five 
#>   30   31   31   30   31