hdd.Rd
This function connects to a hard drive data set (HDD). You can access the hard
drive data in a similar way to a data.table
.
hdd(dir)
The directory where the hard drive data set is.
This function returns an object of class hdd
which is linked to
a folder on disk containing the data. The data is not loaded in R.
This object is not intended to be interacted with directly as a regular list. Please use the methods
sub-.hdd
and cash-.hdd
to extract the data.
HDD has been created to deal with out of memory data sets. The data set exists in the hard drive, split in multiple files -- each file being workable in memory.
You can perform extraction and manipulation operations as with a regular data
set with sub-.hdd
. Each operation is performed chunk-by-chunk
behind the scene.
In terms of performance, working with complete data sets in memory will always be faster. This is because read/write operations on disk are order of magnitude slower than read/write in memory. However, this might be the only way to deal with out of memory data.
See hdd
, sub-.hdd
and cash-.hdd
for the extraction and manipulation of out of memory data. For importation of
HDD data sets from text files: see txt2hdd
.
See hdd_slice
to apply functions to chunks of data (and create
HDD objects) and hdd_merge
to merge large files.
To create/reshape HDD objects from memory or from other HDD objects, see
write_hdd
.
To display general information from HDD objects: origin
,
summary.hdd
, print.hdd
,
dim.hdd
and names.hdd
.
# Toy example with iris data
iris_path = tempfile()
fwrite(iris, iris_path)
# destination path
hdd_path = tempfile()
# reading the text file with 50 rows chunks:
txt2hdd(iris_path, dirDest = hdd_path, rowsPerChunk = 50)
# creating a HDD object
base_hdd = hdd(hdd_path)
# Summary information on the whole data set
summary(base_hdd)
#> Hard drive data of 7.28 KB. Made of 3 files.
#> Location: C:/Users/lrberge/AppData/Local/Temp/Rtmpa0wfuK/file568828a73083/
#> 150 lines, 5 variables.
# Looking at it like a regular data.frame
print(base_hdd)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> ----
#> 148 6.5 3 5.2 2 virginica
#> 149 6.2 3.4 5.4 2.3 virginica
#> 150 5.9 3 5.1 1.8 virginica
dim(base_hdd)
#> [1] 150 5
names(base_hdd)
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"