write_hdd.Rd
This function saves in-memory/HDD data sets into HDD repositories. Useful to append several data sets.
write_hdd(
x,
dir,
chunkMB = Inf,
rowsPerChunk,
compress = 50,
add = FALSE,
replace = FALSE,
showWarning,
...
)
A data set.
The HDD repository, i.e. the directory where the HDD data is.
If the data has to be split in several files of chunkMB
sizes. Default is Inf
.
Integer, default is missing. Alternative to the argument
chunkMB
. If provided, the data will be split in several files of rowsPerChunk
rows.
Compression rate to be applied by write_fst
.
Default is 50.
Should the file be added to the existing repository? Default is FALSE
.
If add = FALSE
, should any existing document be replaced?
Default is FALSE
.
If the data x
has no observation, then a warning is
raised if showWarning = TRUE
. By default, it occurs only if write_hdd
is NOT called within a function.
Not currently used.
This function does not return anything in R. Instead it creates a folder
on disk containing .fst
files. These files represent the data that has been
converted to the hdd
format.
You can then read the created data with the function hdd()
.
Creating a HDD data set with this function always create an additional file named
“_hdd.txt” in the HDD folder. This file contains summary information on
the data: the number of rows, the number of variables, the first five lines and
a log of how the HDD data set has been created. To access the log directly from
R
, use the function origin
.
See hdd
, sub-.hdd
and cash-.hdd
for the extraction and manipulation of out of memory data. For importation of
HDD data sets from text files: see txt2hdd
.
See hdd_slice
to apply functions to chunks of data (and create
HDD objects) and hdd_merge
to merge large files.
To create/reshape HDD objects from memory or from other HDD objects, see
write_hdd
.
To display general information from HDD objects: origin
,
summary.hdd
, print.hdd
,
dim.hdd
and names.hdd
.
# Toy example with iris data
# Let's create a HDD data set from iris data
hdd_path = tempfile() # => folder where the data will be saved
write_hdd(iris, hdd_path)
# Let's add data to it
for(i in 1:10) write_hdd(iris, hdd_path, add = TRUE)
base_hdd = hdd(hdd_path)
summary(base_hdd) # => 11 files, 1650 lines, 48.7KB on disk
#> Hard drive data of 48.7 KB. Made of 11 files.
#> Location: C:/Users/lrberge/AppData/Local/Temp/Rtmpa0wfuK/file56886a1e147b/
#> 1650 lines, 5 variables.
# Let's save the iris data by chunks of 1KB
# we use replace = TRUE to delete the previous data
write_hdd(iris, hdd_path, chunkMB = 0.001, replace = TRUE)
base_hdd = hdd(hdd_path)
summary(base_hdd) # => 8 files, 150 lines, 10.2KB on disk
#> Hard drive data of 10.2 KB. Made of 8 files.
#> Location: C:/Users/lrberge/AppData/Local/Temp/Rtmpa0wfuK/file56886a1e147b/
#> 150 lines, 5 variables.