This function saves in-memory/HDD data sets into HDD repositories. Useful to append several data sets.

write_hdd(
  x,
  dir,
  chunkMB = Inf,
  rowsPerChunk,
  compress = 50,
  add = FALSE,
  replace = FALSE,
  showWarning,
  ...
)

Arguments

x

A data set.

dir

The HDD repository, i.e. the directory where the HDD data is.

chunkMB

If the data has to be split in several files of chunkMB sizes. Default is Inf.

rowsPerChunk

Integer, default is missing. Alternative to the argument chunkMB. If provided, the data will be split in several files of rowsPerChunk rows.

compress

Compression rate to be applied by write_fst. Default is 50.

add

Should the file be added to the existing repository? Default is FALSE.

replace

If add = FALSE, should any existing document be replaced? Default is FALSE.

showWarning

If the data x has no observation, then a warning is raised if showWarning = TRUE. By default, it occurs only if write_hdd is NOT called within a function.

...

Not currently used.

Value

This function does not return anything in R. Instead it creates a folder on disk containing .fst files. These files represent the data that has been converted to the hdd format.

You can then read the created data with the function hdd().

Details

Creating a HDD data set with this function always create an additional file named “_hdd.txt” in the HDD folder. This file contains summary information on the data: the number of rows, the number of variables, the first five lines and a log of how the HDD data set has been created. To access the log directly from R, use the function origin.

See also

See hdd, sub-.hdd and cash-.hdd for the extraction and manipulation of out of memory data. For importation of HDD data sets from text files: see txt2hdd.

See hdd_slice to apply functions to chunks of data (and create HDD objects) and hdd_merge to merge large files.

To create/reshape HDD objects from memory or from other HDD objects, see write_hdd.

To display general information from HDD objects: origin, summary.hdd, print.hdd, dim.hdd and names.hdd.

Author

Laurent Berge

Examples


# Toy example with iris data

# Let's create a HDD data set from iris data
hdd_path = tempfile() # => folder where the data will be saved
write_hdd(iris, hdd_path)
# Let's add data to it
for(i in 1:10) write_hdd(iris, hdd_path, add = TRUE)

base_hdd = hdd(hdd_path)
summary(base_hdd) # => 11 files, 1650 lines, 48.7KB on disk
#> Hard drive data of 48.7 KB. Made of 11 files.
#> Location: C:/Users/lrberge/AppData/Local/Temp/Rtmpa0wfuK/file56886a1e147b/
#> 1650 lines, 5 variables.

# Let's save the iris data by chunks of 1KB
# we use replace = TRUE to delete the previous data
write_hdd(iris, hdd_path, chunkMB = 0.001, replace = TRUE)

base_hdd = hdd(hdd_path)
summary(base_hdd) # => 8 files, 150 lines, 10.2KB on disk
#> Hard drive data of 10.2 KB. Made of 8 files.
#> Location: C:/Users/lrberge/AppData/Local/Temp/Rtmpa0wfuK/file56886a1e147b/
#> 150 lines, 5 variables.