write_hdd.RdThis function saves in-memory/HDD data sets into HDD repositories. Useful to append several data sets.
write_hdd(
x,
dir,
chunkMB = Inf,
rowsPerChunk,
compress = 50,
add = FALSE,
replace = FALSE,
showWarning,
...
)A data set.
The HDD repository, i.e. the directory where the HDD data is.
If the data has to be split in several files of chunkMB
sizes. Default is Inf.
Integer, default is missing. Alternative to the argument
chunkMB. If provided, the data will be split in several files of rowsPerChunk
rows.
Compression rate to be applied by write_fst.
Default is 50.
Should the file be added to the existing repository? Default is FALSE.
If add = FALSE, should any existing document be replaced?
Default is FALSE.
If the data x has no observation, then a warning is
raised if showWarning = TRUE. By default, it occurs only if write_hdd
is NOT called within a function.
Not currently used.
This function does not return anything in R. Instead it creates a folder
on disk containing .fst files. These files represent the data that has been
converted to the hdd format.
You can then read the created data with the function hdd().
Creating a HDD data set with this function always create an additional file named
“_hdd.txt” in the HDD folder. This file contains summary information on
the data: the number of rows, the number of variables, the first five lines and
a log of how the HDD data set has been created. To access the log directly from
R, use the function origin.
See hdd, sub-.hdd and cash-.hdd
for the extraction and manipulation of out of memory data. For importation of
HDD data sets from text files: see txt2hdd.
See hdd_slice to apply functions to chunks of data (and create
HDD objects) and hdd_merge to merge large files.
To create/reshape HDD objects from memory or from other HDD objects, see
write_hdd.
To display general information from HDD objects: origin,
summary.hdd, print.hdd,
dim.hdd and names.hdd.
# Toy example with iris data
# Let's create a HDD data set from iris data
hdd_path = tempfile() # => folder where the data will be saved
write_hdd(iris, hdd_path)
# Let's add data to it
for(i in 1:10) write_hdd(iris, hdd_path, add = TRUE)
base_hdd = hdd(hdd_path)
summary(base_hdd) # => 11 files, 1650 lines, 48.7KB on disk
#> Hard drive data of 48.7 KB. Made of 11 files.
#> Location: C:/Users/lrberge/AppData/Local/Temp/Rtmpa0wfuK/file56886a1e147b/
#> 1650 lines, 5 variables.
# Let's save the iris data by chunks of 1KB
# we use replace = TRUE to delete the previous data
write_hdd(iris, hdd_path, chunkMB = 0.001, replace = TRUE)
base_hdd = hdd(hdd_path)
summary(base_hdd) # => 8 files, 150 lines, 10.2KB on disk
#> Hard drive data of 10.2 KB. Made of 8 files.
#> Location: C:/Users/lrberge/AppData/Local/Temp/Rtmpa0wfuK/file56886a1e147b/
#> 150 lines, 5 variables.