Splits a character string with respect to pattern
string_split(
x,
split,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
envir = parent.frame()
)
stsplit(
x,
split,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
envir = parent.frame()
)
A character vector.
A character scalar. Used to split the character vectors. By default
this is a regular expression. You can use flags in the pattern in the form flag1, flag2/pattern
.
Available flags are ignore
(case), fixed
(no regex), word (add word boundaries),
magic (add interpolation with "{}"
). Example:
if "ignore/hello" and the text contains "Hello", it will be split at "Hello".
Shortcut: use the first letters of the flags. Ex: "iw/one" will split at the word
"one" (flags 'ignore' + 'word').
Logical scalar, default is TRUE
. If TRUE
, then when the vector input x
is of length 1, a character vector is returned instead of a list.
Logical, default is FALSE
. Whether to consider the argument split
as fixed (and not as a regular expression).
Logical scalar, default is FALSE
. If TRUE
, then case insensitive search is triggered.
Logical scalar, default is FALSE
. If TRUE
then a) word boundaries are added to the pattern,
and b) patterns can be chained by separating them with a comma, they are combined with an OR logical operation.
Example: if word = TRUE
, then pattern = "The, mountain" will select strings containing either the word
'The' or the word 'mountain'.
Environment in which to evaluate the interpolations if the flag "magic"
is provided.
Default is parent.frame()
.
If simplify = TRUE
(default), the object returned is:
a character vector if x
, the vector in input, is of length 1: the character vector contains
the result of the split.
a list of the same length as x
. The ith element of the list is a character vector
containing the result of the split of the ith element of x
.
If simplify = FALSE
, the object returned is always a list.
stsplit()
: Alias to string_split
All stringmagic
functions support generic flags in regular-expression patterns.
The flags are useful to quickly give extra instructions, similarly to usual
regular expression flags.
Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names
separated from the pattern with a slash (/
). Example: string_which(c("hello...", "world"), "fixed/.")
returns 1
.
Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant "any character".
The no-flag verion string_which(c("hello...", "world"), ".")
returns 1:2
.
Alternatively, and this is recommended, you can collate the initials of the flags instead of using a comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".
The four flags always available are: "ignore", "fixed", "word" and "magic".
"ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)" at the beginning of the pattern.
"fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "[" (among others) lose their special meaning and are treated for what they are: simple characters.
"word" adds word boundaries ("\\b"
in regex language) to the pattern. Further, the comma (","
)
becomes a word separator. Technically, "word/one, two" is treated as "\b(one|two)\b". Example:
string_clean("Am I ambushed?", "wi/am")
leads to " I ambushed?" thanks to the flags "ignore" and "word".
"magic" allows to interpolate variables inside the pattern before regex interpretation.
For example if letters = "aiou"
then string_clean("My great goose!", "magic/[{letters}] => e")
leads to "My greet geese!"
time = "This is the year 2024."
# we break the sentence
string_split(time, " ")
#> [1] "This" "is" "the" "year" "2024."
# simplify = FALSE leads to a list
string_split(time, " ", simplify = FALSE)
#> [[1]]
#> [1] "This" "is" "the" "year" "2024."
#>
# let's break at "is"
string_split(time, "is")
#> [1] "Th" " " " the year 2024."
# now breaking at the word "is"
# NOTE: we use the flag `word` (`w/`)
string_split(time, "w/is")
#> [1] "This " " the year 2024."
# same but using a pattern from a variable
# NOTE: we use the `magic` flag
pat = "is"
string_split(time, "mw/{pat}")
#> [1] "This " " the year 2024."