Simple and powerful string manipulation with the dot square bracket operator

Compactly performs many low level string operations. Advanced support for pluralization.

Usage

dsb(
  ...,
  frame = parent.frame(),
  sep = "",
  vectorize = FALSE,
  nest = TRUE,
  collapse = NULL
)

Arguments

...

Character scalars that will be collapsed with the argument sep. You can use ".[x]" within each character string to insert the value of x in the string. You can add string operations in each ".[]" instance with the syntax "'arg'op ? x" (resp. "'arg'op ! x") to apply the operation 'op' with the argument 'arg' to x (resp. the verbatim of x). Otherwise, what to say? Ah, nesting is enabled, and since there's over 30 operators, it's a bit complicated to sort you out in this small space. But type dsb("--help") to prompt an (almost) extensive help.

frame

An environment used to evaluate the variables in ".[]".

sep

Character scalar, default is "". It is used to collapse all the elements in ....

vectorize

Logical, default is FALSE. If TRUE, Further, elements in ... are NOT collapsed together, but instead vectorised.

nest

Logical, default is TRUE. Whether the original character strings should be nested into a ".[]". If TRUE, then things like dsb("S!one, two") are equivalent to dsb(".[S!one, two]") and hence create the vector c("one", "two").

collapse

Character scalar or NULL (default). If provided, the resulting character vector will be collapsed into a character scalar using this value as a separator.

There are over 30 basic string operations, it supports pluralization, it's fast (e.g. faster than glue in the benchmarks), string operations can be nested (it may be the most powerful feature), operators have sensible defaults.

See detailed help on the console with dsb("--help"). The real help is in fact in the "Examples" section.

Value

It returns a character vector whose length depends on the elements and operations in ".[]".

Examples


#
# BASIC USAGE ####
#

x = c("Romeo", "Juliet")

# .[x] inserts x
dsb("Hello .[x]!")
#> Hello Romeo!
#> Hello Juliet!

# elements in ... are collapsed with "" (default)
dsb("Hello .[x[1]], ",
    "how is .[x[2]] doing?")
#> Hello Romeo, how is Juliet doing?

# Splitting a comma separated string
# The mechanism is explained later
dsb("/J. Mills, David, Agnes, Dr Strong")
#> J. Mills
#> David
#> Agnes
#> Dr Strong

# Nota: this is equivalent to (explained later)
dsb("', *'S !J. Mills, David, Agnes, Dr Strong")
#> J. Mills
#> David
#> Agnes
#> Dr Strong

#
# Applying low level operations to strings
#

# Two main syntax:

# A) expression evaluation
# .[operation ? x]
#             | |
#             |  \-> the expression to be evaluated
#              \-> ? means that the expression will be evaluated

# B) verbatim
# .[operation ! x]
#             | |
#             |  \-> the expression taken as verbatim (here ' x')
#              \-> ! means that the expression is taken as verbatim

# operation: usually 'arg'op with op an operation code.

# Example: splitting
x = "hello dear"
dsb(".[' 's ? x]")
#> hello
#> dear
# x is split by ' '

dsb(".[' 's !hello dear]")
#> hello
#> dear
# 'hello dear' is split by ' '
# had we used ?, there would have been an error

# By default, the string is nested in .[], so in that case no need to use .[]:
dsb("' 's ? x")
#> hello
#> dear
dsb("' 's !hello dear")
#> hello
#> dear

# There are 35 string operators
# Operators usually have a default value
# Operations can be chained by separating them with a comma

# Example: default of 's' is ' ' + chaining with collapse
dsb("s, ' my 'c!hello dear")
#> hello my dear

#
# Nesting
#

# .[operations ! s1.[expr]s2]
#              |    |
#              |     \-> expr will be evaluated then added to the string
#               \-> nesting requires verbatim evaluation: '!'

dsb("The variables are: .[C!x.[1:4]].")
#> The variables are: x1, x2, x3 and x4.

# This one is a bit ugly but it shows triple nesting
dsb("The variables are: .[w, C!.[2* ! x.[1:4]].[S, 4** ! , _sq]].")
#> The variables are: x1, x2, x3, x4, x1_sq, x2_sq, x3_sq and x4_sq.

#
# Splitting
#

# s: split with fixed pattern, default is ' '
dsb("s !a b c")
#> a
#> b
#> c
dsb("' b 's !a b c")
#> a
#> c

# S: split with regex pattern, default is ', *'
dsb("S !a, b, c")
#> a
#> b
#> c
dsb("'[[:punct:] ]'S !a! b; c")
#> a
#> b
#> c

#
# Collapsing
#

# c and C do the same, their default is different
# syntax: 's1||s2' with
# - s1 the string used for collapsing
# - s2 (optional) the string used for the last collapse

# c: default is ' '
dsb("c?1:3")
#> 1 2 3

# C: default is ', || and '
dsb("C?1:3")
#> 1, 2 and 3

dsb("', || or 'c?1:4")
#> 1, 2, 3 or 4

#
# Extraction
#

# x: extracts the first pattern
# X: extracts all patterns
# syntax: 'pattern'x
# Default is '[[:alnum:]]+'

x = "This years is... 2020"
dsb("x ? x")
#> This
dsb("X ? x")
#> This
#> years
#> is
#> 2020

dsb("'\\d+'x ? x")
#> 2020

#
# STRING FORMATTING ####
#

#
# u, U: uppercase first/all letters

# first letter
dsb("u!julia mills")
#> Julia mills

# title case: split -> upper first letter -> collapse
dsb("s, u, c!julia mills")
#> Julia Mills

# upper all letters
dsb("U!julia mills")
#> JULIA MILLS

#
# L: lowercase

dsb("L!JULIA MILLS")
#> julia mills

#
# q, Q: single or double quote

dsb("S, q, C!Julia, David, Wilkins")
#> 'Julia', 'David' and 'Wilkins'
dsb("S, Q, C!Julia, David, Wilkins")
#> "Julia", "David" and "Wilkins"

#
# f, F: formats the string to fit the same length


score = c(-10, 2050)
nm = c("Wilkins", "David")
dsb("Monopoly scores:\n.['\n'c ! - .[f ? nm]: .[F ? score] US$]")
#> Monopoly scores:
#>  - Wilkins:  -10 US$
#>  - David  : 2050 US$

# OK that example may have been a bit too complex,
# let's make it simple:

dsb("Scores: .[f ? score]")
#> Scores: -10 
#> Scores: 2050
dsb("Names: .[F ? nm]")
#> Names: Wilkins
#> Names:   David

#
# w, W: reformat the white spaces
# w: suppresses trimming white spaces + normalizes successive white spaces
# W: same but also includes punctuation

dsb("w ! The   white  spaces are now clean.  ")
#> The white spaces are now clean.

dsb("W ! I, really -- truly; love punctuation!!!")
#> I really truly love punctuation 

#
# %: applies sprintf formatting

dsb("pi = .['.2f'% ? pi]")
#> pi = 3.14

#
# a: appends text on each item
# syntax: 's1|s2'a, adds s1 at the beginning and s2 at the end of the string
# It accepts the special values :1:, :i:, :I:, :a:, :A:
# These values create enumerations (only one such value is accepted)

# appending square brackets
dsb("'[|]'a, ' + 'c!x.[1:4]")
#> [x1] + [x2] + [x3] + [x4]

# Enumerations
acad = dsb("/you like admin, you enjoy working on weekends, you really love emails")
dsb("Main reasons to pursue an academic career:\n .[':i:) 'a, C ? acad].")
#> Main reasons to pursue an academic career:
#>  i) you like admin, ii) you enjoy working on weekends and iii) you really love emails.

#
# A: same as 'a' but adds at the begging/end of the full string (not on the elements)
# special values: :n:, :N:, give the number of elements

characters = dsb("/David, Wilkins, Dora, Agnes")
dsb("There are .[':N: characters: 'A, C ? characters].")
#> There are four characters: David, Wilkins, Dora and Agnes.


#
# stop: removes basic English stopwords
# the list is from the Snowball project: http://snowball.tartarus.org/algorithms/english/stop.txt

dsb("stop, w!It is a tale told by an idiot, full of sound and fury, signifying nothing.")
#> tale told idiot, full sound fury, signifying nothing.

#
# k: keeps the first n characters
# syntax: nk: keeps the first n characters
#         'n|s'k: same + adds 's' at the end of shortened strings
#         'n||s'k: same but 's' counts in the n characters kept

words = dsb("/short, constitutional")
dsb("5k ? words")
#> short
#> const

dsb("'5|..'k ? words")
#> short
#> const..

dsb("'5||..'k ? words")
#> short
#> con..

#
# K: keeps the first n elements
# syntax: nK: keeps the first n elements
#         'n|s'K: same + adds the element 's' at the end
#         'n||s'K: same but 's' counts in the n elements kept
#
# Special values :rest: and :REST:, give the number of items dropped

bx = dsb("/Pessac Leognan, Saint Emilion, Marguaux, Saint Julien, Pauillac")
dsb("Bordeaux wines I like: .[3K, ', 'C ? bx].")
#> Bordeaux wines I like: Pessac Leognan, Saint Emilion, Marguaux.

dsb("Bordeaux wines I like: .['3|etc..'K, ', 'C ? bx].")
#> Bordeaux wines I like: Pessac Leognan, Saint Emilion, Marguaux, etc...

dsb("Bordeaux wines I like: .['3||etc..'K, ', 'C ? bx].")
#> Bordeaux wines I like: Pessac Leognan, Saint Emilion, etc...

dsb("Bordeaux wines I like: .['3|and at least :REST: others'K, ', 'C ? bx].")
#> Bordeaux wines I like: Pessac Leognan, Saint Emilion, Marguaux, and at least two others.

#
# Ko, KO: special operator which keeps the first n elements and adds "others"
# syntax: nKo
# KO gives the rest in letters

dsb("Bordeaux wines I like: .[4KO, C ? bx].")
#> Bordeaux wines I like: Pessac Leognan, Saint Emilion, Marguaux and two others.

#
# r, R: string replacement
# syntax: 's'R: deletes the content in 's' (replaces with the empty string)
#         's1 => s2'R replaces s1 into s2
# r: fixed / R: perl = TRUE

dsb("'e'r !The letter e is deleted")
#> Th lttr  is dltd

# adding a perl look-behind
dsb("'(?<! )e'R !The letter e is deleted")
#> Th lttr e is dltd

dsb("'e => a'r !The letter e becomes a")
#> Tha lattar a bacomas a

dsb("'([[:alpha:]]{3})[[:alpha:]]+ => \\1.'R !Trimming the words")
#> Tri. the wor.

#
# *, *c, **, **c: replication, replication + collapse
# syntax: n* or n*c
# ** is the same as * but uses "each" in the replication

dsb("N.[10*c!o]!")
#> Noooooooooo!

dsb("3*c ? 1:3")
#> 123123123
dsb("3**c ? 1:3")
#> 111222333

#
# d: replaces the items by the empty string
# -> useful in conditions

dsb("d!I am going to be annihilated")
#> 

#
# ELEMENT MANIPULATION ####
#

#
# D: deletes all elements
# -> useful in conditions

x = dsb("/I'll, be, deleted")
dsb("D ? x")
#> 

#
# i, I: inserts an item
# syntax: 's1|s2'i: inserts s1 first and s2 last
# I: is the same as i but is 'invisibly' included

characters = dsb("/David, Wilkins, Dora, Agnes, Trotwood")
dsb("'Heep|Spenlow'i, C ? characters")
#> Heep, David, Wilkins, Dora, Agnes, Trotwood and Spenlow

dsb("'Heep|Spenlow'I, C ? characters")
#> Heep
#> David, Wilkins, Dora, Agnes and Trotwood
#> Spenlow


#
# PLURALIZATION ####
#

# There is support for pluralization

#
# *s, *s_: adds 's' or 's ' depending on the number of elements

nb = 1:5
dsb("Number.[*s, D ? nb]: .[C ? nb]")
#> Numbers: 1, 2, 3, 4 and 5
dsb("Number.[*s, D ? 2 ]: .[C ? 2 ]")
#> Number: 2

# or
dsb("Number.[*s, ': 'A, C ? nb]")
#> Numbers: 1, 2, 3, 4 and 5


#
# v, V: adds a verb at the beginning/end of the string
# syntax: 'verb'v

# Unpopular opinion?
brand = c("Apple", "Samsung")
dsb(".[V, C ? brand] overrated.")
#> Apple and Samsung are overrated.
dsb(".[V, C ? brand[1]] overrated.")
#> Apple is overrated.

win = dsb("/Peggoty, Agnes, Emily")
dsb("The winner.[*s_, v, C ? win].")
#> The winners are Peggoty, Agnes and Emily.
dsb("The winner.[*s_, v, C ? win[1]].")
#> The winner is Peggoty.

# Other verbs
dsb(".[' have'V, C ? win] won a prize.")
#> Peggoty, Agnes and Emily have won a prize.
dsb(".[' have'V, C ? win[1]] won a prize.")
#> Peggoty has won a prize.

dsb(".[' was'V, C ? win] unable to come.")
#> Peggoty, Agnes and Emily were unable to come.
dsb(".[' was'V, C ? win[1]] unable to come.")
#> Peggoty was unable to come.

#
# *A: appends text depending on the length of the vector
# syntax: 's1|s2 / s3|s4'
#         if length == 1: applies 's1|s2'A
#         if length >  1: applies 's3|s4'A

win = dsb("/Barkis, Micawber, Murdstone")
dsb("The winner.[' is /s are '*A, C ? win].")
#> The winners are Barkis, Micawber and Murdstone.
dsb("The winner.[' is /s are '*A, C ? win[1]].")
#> The winner is Barkis.

#
# CONDITIONS ####
#

# Conditions can be applied with 'if' statements.",
# The syntax is 'type comp value'if(true : false), with
# - type: either 'len', 'char', 'fixed' or 'regex'
#   + len: number of elements in the vector
#   + char: number of characters
#   + fixed: fixed pattern
#   + regex: regular expression pattern
# - comp: a comparator:
#   + valid for len/char: >, <, >=, <=, !=, ==
#   + valid for fixed/regex: !=, ==
# - value: a value for which the comparison is applied.
# - true: operations to be applied if true (can be void)
# - false: operations to be applied if false (can be void)

dsb("'char <= 2'if('(|)'a : '[|]'a), ' + 'c ? c(1, 12, 123)")
#> (1) + (12) + [123]

sentence = "This is a sentence with some longish words."
dsb("s, 'char<=4'if(D), c ? sentence")
#> sentence longish words.

dsb("s, 'fixed == e'if(:D), c ! Only words with an e are selected.")
#> e are selected.

#
# ARGUMENTS FROM THE FRAME ####
#

# Arguments can be evaluated from the calling frame.
# Simply use backticks instead of quotes.

dollar = 6
reason = "glory"
dsb("Why do you develop packages? For .[`dollar`*c!$]?",
    "For money? No... for .[U,''s, c?reason]!", sep = "\n")
#> Why do you develop packages? For $$$$$$?
#> For money? No... for G L O R Y!