Convenient way to get elements from a character vector.

string_get(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  seq = FALSE,
  seq.unik = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

stget(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  seq = FALSE,
  seq.unik = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

Arguments

x

A character vector.

...

Character scalars representing the patterns to be found. By default they are (perl) regular-expressions. Use ' & ' or ' | ' to chain patterns and combine their result logically (ex: '[[:alpha:]] & \\d' gets strings containing both letters and numbers). You can negate by adding a ! first (ex: "!sepal$" will return TRUE for strings that do not end with "sepal"). Add flags with the syntax 'flag1, flag2/pattern'. Available flags are: 'fixed', 'ignore', 'word' and 'magic'. Ex: "ignore/sepal" would get "Sepal.Length" (wouldn't be the case w/t 'ignore'). Shortcut: use the first letters of the flags. Ex: "if/dt[" would get "DT[i = 5]" (flags 'ignore' + 'fixed'). For 'word', it adds word boundaries to the pattern. The magic flag first interpolates values directly into the pattern with "".

fixed

Logical scalar, default is FALSE. Whether to trigger a fixed search instead of a regular expression search (default).

ignore.case

Logical scalar, default is FALSE. If TRUE, then case insensitive search is triggered.

word

Logical scalar, default is FALSE. If TRUE then a) word boundaries are added to the pattern, and b) patterns can be chained by separating them with a comma, they are combined with an OR logical operation. Example: if word = TRUE, then pattern = "The, mountain" will select strings containing either the word 'The' or the word 'mountain'.

or

Logical, default is FALSE. In the presence of two or more patterns, whether to combine them with a logical "or" (the default is to combine them with a logical "and").

seq

Logical, default is FALSE. The argument pattern accepts a vector of patterns which are combined with an and by default. If seq = TRUE, then it is like if string_get was called sequentially with its results stacked. See examples.

seq.unik

Logical, default is FALSE. The argument ... (or the argument pattern) accepts a vector of patterns which are combined with an and by default. If seq.unik = TRUE, then string_get is called sequentially with its results stacked, and unique() is applied in the end. See examples.

pattern

(If provided, elements of ... are ignored.) A character vector representing the patterns to be found. By default a (perl) regular-expression search is triggered. Use ' & ' or ' | ' to chain patterns and combine their result logically (ex: '[[:alpha:]] & \\d' gets strings containing both letters and numbers). You can negate by adding a ! first (ex: "!sepal$" will return TRUE for strings that do not end with "sepal"). Add flags with the syntax 'flag1, flag2/pattern'. Available flags are: 'fixed', 'ignore', 'word' and 'magic'. Ex: "ignore/sepal" would get "Sepal.Length" (wouldn't be the case w/t 'ignore'). Shortcut: use the first letters of the flags. Ex: "if/dt[" would get "DT[i = 5]" (flags 'ignore' + 'fixed'). For 'word', it adds word boundaries to the pattern. The magic flag first interpolates values directly into the pattern with "".

envir

Environment in which to evaluate the interpolations if the flag "magic" is provided. Default is parent.frame().

Value

It always return a character vector.

Details

This function is a wrapper to string_is().

Functions

  • stget(): Alias to string_get

Caching

In an exploratory stage, it can be useful to quicky get values from a vector with the least hassle as possible. Hence string_get implements caching, so that users do not need to repeat the value of the argument x in successive function calls, and can concentrate only on the selection patterns.

Caching is a feature only available when the user calls string_get from the global environment. If that feature were available in regular code, it would be too dangerous, likely leading to hard to debug bugs. Hence caching is disabled when used within code (i.e. inside a function or inside an automated script), and function calls without the main argument will lead to errors in such scripts.

Generic regular expression flags

All stringmagic functions support generic flags in regular-expression patterns. The flags are useful to quickly give extra instructions, similarly to usual regular expression flags.

Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names separated from the pattern with a slash (/). Example: string_which(c("hello...", "world"), "fixed/.") returns 1. Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant "any character". The no-flag verion string_which(c("hello...", "world"), ".") returns 1:2.

Alternatively, and this is recommended, you can collate the initials of the flags instead of using a comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".

The four flags always available are: "ignore", "fixed", "word" and "magic".

  • "ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)" at the beginning of the pattern.

  • "fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "[" (among others) lose their special meaning and are treated for what they are: simple characters.

  • "word" adds word boundaries ("\\b" in regex language) to the pattern. Further, the comma (",") becomes a word separator. Technically, "word/one, two" is treated as "\b(one|two)\b". Example: string_clean("Am I ambushed?", "wi/am") leads to " I ambushed?" thanks to the flags "ignore" and "word".

  • "magic" allows to interpolate variables inside the pattern before regex interpretation. For example if letters = "aiou" then string_clean("My great goose!", "magic/[{letters}] => e") leads to "My greet geese!"

See also

String operations: string_is(), string_get(), string_clean(), string_split2df(). Chain basic operations with string_ops(). Clean character vectors efficiently with string_clean().

Use string_vec() to create simple string vectors.

String interpolation combined with operation chaining: string_magic(). You can change string_magic default values with string_magic_alias() and add custom operations with string_magic_register_fun().

Display messages while benefiting from string_magic interpolation with cat_magic() and message_magic().

Other tools with aliases: cat_magic_alias(), string_magic(), string_magic_alias(), string_ops_alias(), string_vec_alias()

Author

Laurent R. Berge

Examples


x = rownames(mtcars)

# find all Mazda cars
string_get(x, "Mazda")
#> [1] "Mazda RX4"     "Mazda RX4 Wag"
# same with ignore case flag
string_get(x, "i/mazda")
#> [1] "Mazda RX4"     "Mazda RX4 Wag"

# all cars containing a single digit (we use the 'word' flag)
string_get(x, "w/\\d")
#> [1] "Hornet 4 Drive" "Fiat X1-9"      "Porsche 914-2" 

# finds car names without numbers AND containing `u`
string_get(x, "!\\d", "u")
#> [1] "Hornet Sportabout" "Lotus Europa"     
# equivalently
string_get(x, "!\\d & u")
#> [1] "Hornet Sportabout" "Lotus Europa"     

# Stacks all Mazda and Volvo cars. Mazda first
string_get(x, "Mazda", "Volvo", seq = TRUE)
#> [1] "Mazda RX4"     "Mazda RX4 Wag" "Volvo 142E"   

# Stacks all Mazda and Volvo cars. Volvo first
string_get(x, "Volvo", "Mazda", seq = TRUE)
#> [1] "Volvo 142E"    "Mazda RX4"     "Mazda RX4 Wag"

# let's get the first word of each car name
car_first = string_ops(x, "extract.first")
# we select car brands ending with 'a', then ending with 'i'
string_get(car_first, "a$", "i$", seq = TRUE)
#> [1] "Mazda"    "Mazda"    "Honda"    "Toyota"   "Toyota"   "Ferrari"  "Maserati"
# seq.unik is similar to seq but applies unique()
string_get(car_first, "a$", "i$", seq.unik = TRUE)
#> [1] "Mazda"    "Honda"    "Toyota"   "Ferrari"  "Maserati"

#
# flags
#

# you can combine the flags
x = string_magic("/One, two, one... Two!, Microphone, check")
# regular
string_get(x, "one")
#> [1] "/One, two, one... Two!, Microphone, check"
# ignore case
string_get(x, "i/one")
#> [1] "/One, two, one... Two!, Microphone, check"
# + word boundaries
string_get(x, "iw/one")
#> [1] "/One, two, one... Two!, Microphone, check"

# you can escape the meaning of ! with backslashes
string_get(x, "\\!")
#> [1] "/One, two, one... Two!, Microphone, check"

#
# Caching
#

# Caching is enabled when the function is used interactively
# so you don't need to repeat the argument 'x'
# Mostly useful at an exploratory stage

if(interactive() && is.null(sys.calls())){
   
   # first run, the data is cached
   string_get(row.names(mtcars), "i/vol")

   # now you don't need to specify the data
   string_get("i/^m & 4")
}