vignettes/ref_regex_logic.rmd
ref_regex_logic.rmd
In stringmagic
, any time you use a regular expression
(regex) to detect a pattern in a character string, you can use
regex logic. The syntax to logically combine regular expressions is
intuitive: simply use regular logical operators and it will work!
The functions for whcih regex logic is available are: a) pattern
detection functions (string_is
, string_get
,
etc), and b) string replacement functions (string_clean
,
string_replace
) with the total
flag (see the
vignette
on regex flags).
Assume "pat1"
and "pat2"
are two regular
expression patterns and we want to test whether the string
x
contains a combination of these patterns. Then:
"pat1 & pat2"
= x
contains
pat1
AND x
contains pat2
"pat1 | pat2"
= x
contains
pat1
OR x
contains pat2
"!pat1"
= x
does not contain
pat1
"!pat1 & pat2"
= x
does not contain
pat1
AND x
contains pat2
Hence the three logial operators are:
" & "
: logical AND, it must be a
space + an ampersand + a space (just the &
does not
work)" | "
: logical OR, it must be a space
+ a pipe + a space (just the |
does not work)"!"
: logical NOT, it works only when it is the first
character of the pattern. Note that anything after it (including spaces
and other !
) is part of the regular
expression
The parsing of the logical elements is done before any regex interpretation. The logical evaluations are done from left to right and are sequentially combined.
Ex: selecting cars.
cars = row.names(mtcars)
print(cars)
#> [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
#> [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
#> [7] "Duster 360" "Merc 240D" "Merc 230"
#> [10] "Merc 280" "Merc 280C" "Merc 450SE"
#> [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
#> [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
#> [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
#> [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
#> [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
#> [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
#> [31] "Maserati Bora" "Volvo 142E"
# which one...
# ... contains all letters 'a', 'e', 'i' AND 'o'?
string_get(cars, "a & e & i & o")
#> [1] "Cadillac Fleetwood" "Lincoln Continental" "Pontiac Firebird"
#> [4] "Ferrari Dino" "Maserati Bora"
# ... does NOT contain any digit?
string_get(cars, "!\\d")
#> [1] "Hornet Sportabout" "Valiant" "Cadillac Fleetwood"
#> [4] "Lincoln Continental" "Chrysler Imperial" "Honda Civic"
#> [7] "Toyota Corolla" "Toyota Corona" "Dodge Challenger"
#> [10] "AMC Javelin" "Pontiac Firebird" "Lotus Europa"
#> [13] "Ford Pantera L" "Ferrari Dino" "Maserati Bora"
You cannot combine logical statements with parentheses.
For example: "hello | (world & my lady)"
leads to:
x
contains "hello"
or contains
"(world"
, and contains "my lady)"
. The two
latter are invalid regexes but can make sense if you have the flag
“fixed” turned on. To escape the meaning of the logical operators, see
the dedicated section.
The logical "not"
always apply to a single pattern and
not to the full pattern.
To escape the meaning of the logical operators, there are two solutions to escape them:
"a \\& b"
means x
contains
"a & b"
"a [&] b"
in regex parlance and won’t be parsed as a
logical ANDThe two solutions work for the three operators:
" & "
, " | "
and "!"
.
All stringmagic
regexes accept optional flags. Please
see the associated
vignette.
When you add flags to a pattern, these apply to all regex
sub-patterns. This means that "f/( | )"
treats the two
parentheses as “fixed”. You cannot add flags specific to a single
sub-pattern.