Function that detects if one or more patterns are in a string. The patterns can be chained, by default this is a regex search but special flags be triggered with a specific syntax, supports negation.

string_is(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame(),
  last = NULL
)

string_any(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

string_all(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

string_which(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

st_is(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame(),
  last = NULL
)

st_any(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

st_all(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

stwhich(
  x,
  ...,
  fixed = FALSE,
  ignore.case = FALSE,
  word = FALSE,
  or = FALSE,
  pattern = NULL,
  envir = parent.frame()
)

Arguments

x

A character vector.

...

Character scalars representing the patterns to be found. By default they are (perl) regular-expressions. Use ' & ' or ' | ' to chain patterns and combine their result logically (ex: '[[:alpha:]] & \\d' gets strings containing both letters and numbers). You can negate by adding a ! first (ex: "!sepal$" will return TRUE for strings that do not end with "sepal"). Add flags with the syntax 'flag1, flag2/pattern'. Available flags are: 'fixed', 'ignore', 'word' and 'magic'. Ex: "ignore/sepal" would get "Sepal.Length" (wouldn't be the case w/t 'ignore'). Shortcut: use the first letters of the flags. Ex: "if/dt[" would get "DT[i = 5]" (flags 'ignore' + 'fixed'). For 'word', it adds word boundaries to the pattern. The magic flag first interpolates values directly into the pattern with "".

fixed

Logical scalar, default is FALSE. Whether to trigger a fixed search instead of a regular expression search (default).

ignore.case

Logical scalar, default is FALSE. If TRUE, then case insensitive search is triggered.

word

Logical scalar, default is FALSE. If TRUE then a) word boundaries are added to the pattern, and b) patterns can be chained by separating them with a comma, they are combined with an OR logical operation. Example: if word = TRUE, then pattern = "The, mountain" will select strings containing either the word 'The' or the word 'mountain'.

or

Logical, default is FALSE. In the presence of two or more patterns, whether to combine them with a logical "or" (the default is to combine them with a logical "and").

pattern

(If provided, elements of ... are ignored.) A character vector representing the patterns to be found. By default a (perl) regular-expression search is triggered. Use ' & ' or ' | ' to chain patterns and combine their result logically (ex: '[[:alpha:]] & \\d' gets strings containing both letters and numbers). You can negate by adding a ! first (ex: "!sepal$" will return TRUE for strings that do not end with "sepal"). Add flags with the syntax 'flag1, flag2/pattern'. Available flags are: 'fixed', 'ignore', 'word' and 'magic'. Ex: "ignore/sepal" would get "Sepal.Length" (wouldn't be the case w/t 'ignore'). Shortcut: use the first letters of the flags. Ex: "if/dt[" would get "DT[i = 5]" (flags 'ignore' + 'fixed'). For 'word', it adds word boundaries to the pattern. The magic flag first interpolates values directly into the pattern with "".

envir

Environment in which to evaluate the interpolations if the flag "magic" is provided. Default is parent.frame().

last

A function or NULL (default). If a function, it will be applied to the vector just before returning it.

Value

It returns a logical vector of the same length as x.

The function string_which returns a numeric vector.

Details

The internal function used to find the patterns is base::grepl() with perl = TRUE.

Functions

  • string_any(): Detects if at least one element of a vector matches a regex pattern

  • string_all(): Detects if all elements of a vector match a regex pattern

  • string_which(): Returns the indexes of the values in which a pattern is detected

  • st_is(): Alias to string_is

  • st_any(): Alias to string_any

  • st_all(): Alias to string_all

  • stwhich(): Alias to string_which

Generic regular expression flags

All stringmagic functions support generic flags in regular-expression patterns. The flags are useful to quickly give extra instructions, similarly to usual regular expression flags.

Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names separated from the pattern with a slash (/). Example: string_which(c("hello...", "world"), "fixed/.") returns 1. Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant "any character". The no-flag verion string_which(c("hello...", "world"), ".") returns 1:2.

Alternatively, and this is recommended, you can collate the initials of the flags instead of using a comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".

The four flags always available are: "ignore", "fixed", "word" and "magic".

  • "ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)" at the beginning of the pattern.

  • "fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "[" (among others) lose their special meaning and are treated for what they are: simple characters.

  • "word" adds word boundaries ("\\b" in regex language) to the pattern. Further, the comma (",") becomes a word separator. Technically, "word/one, two" is treated as "\b(one|two)\b". Example: string_clean("Am I ambushed?", "wi/am") leads to " I ambushed?" thanks to the flags "ignore" and "word".

  • "magic" allows to interpolate variables inside the pattern before regex interpretation. For example if letters = "aiou" then string_clean("My great goose!", "magic/[{letters}] => e") leads to "My greet geese!"

See also

String operations: string_is(), string_get(), string_clean(), string_split2df(). Chain basic operations with string_ops(). Clean character vectors efficiently with string_clean().

Use string_vec() to create simple string vectors.

String interpolation combined with operation chaining: string_magic(). You can change string_magic default values with string_magic_alias() and add custom operations with string_magic_register_fun().

Display messages while benefiting from string_magic interpolation with cat_magic() and message_magic().

Other tools with aliases: cat_magic_alias(), string_magic(), string_magic_alias(), string_ops_alias(), string_vec_alias()

Author

Laurent R. Berge

Examples


# NOTA: using `string_get` instead of `string_is` may lead to a faster understanding 
#       of the examples 

x = string_vec("One, two, one... two, microphone, check")

# default is regular expression search
# => 3 character items
string_is(x, "^...$")
#> [1]  TRUE  TRUE FALSE FALSE FALSE

# to trigger fixed search use the flag 'fixed'
string_is(x, "fixed/...")
#> [1] FALSE FALSE  TRUE FALSE FALSE
# you can just use the first letter
string_is(x, "f/...")
#> [1] FALSE FALSE  TRUE FALSE FALSE

# to negate, use '!' as the first element of the pattern
string_is(x, "f/!...")
#> [1]  TRUE  TRUE FALSE  TRUE  TRUE

# you can combine several patterns with " & " or " | "
string_is(x, "one & c")
#> [1] FALSE FALSE FALSE  TRUE FALSE
string_is(x, "one | c")
#> [1] FALSE FALSE  TRUE  TRUE  TRUE

#
# word: adds word boundaries
#

# compare
string_is(x, "one")
#> [1] FALSE FALSE  TRUE  TRUE FALSE
# with
string_is(x, "w/one")
#> [1] FALSE FALSE  TRUE FALSE FALSE

# words can be chained with commas (it is like an OR logical operation)
string_is(x, "w/one, two")
#> [1] FALSE  TRUE  TRUE FALSE FALSE
# compare with
string_is(x, "w/one & two")
#> [1] FALSE FALSE  TRUE FALSE FALSE
# remember that you can still negate
string_is(x, "w/one & !two")
#> [1] FALSE FALSE FALSE FALSE FALSE

# you can combine the flags
# compare
string_is(x, "w/one")
#> [1] FALSE FALSE  TRUE FALSE FALSE
# with
string_is(x, "wi/one")
#> [1]  TRUE FALSE  TRUE FALSE FALSE

#
# the `magic` flag
#

p = "one"
string_is(x, "m/{p}")
#> [1] FALSE FALSE  TRUE  TRUE FALSE
# Explanation:
# - "p" is interpolated into "one"
# - we get the equivalent: string_is(x, "one")


#
# string_which
#

# it works exactly the same way as string_is
# Which are the items containing an 'e' and an 'o'?
string_which(x, "e", "o")
#> [1] 3 4
# equivalently
string_which(x, "e & o")
#> [1] 3 4