guess_col_types.Rd
This function is a facility to guess the column types of a text document. It returns columns formatted a la readr.
guess_col_types(dt_or_path, col_names, n = 10000)
Either a data frame or a path.
Optional: the vector of names of the columns, if not contained in the file. Must match the number of columns in the file.
Number of observations used to make the guess. By default, n = 100000
.
It returns a cols
object a la readr
.
The guessing of the column types is based on the 10,000 (set with argument n
) first rows.
Note that by default, columns that are found to be integers are imported as double (in want of integer64 type in readr). Note that for large data sets, sometimes integer-like identifiers can be larger than 16 digits: in these case you must import them as character not to lose information.
See peek
to have a convenient look at the first lines of a text file. See guess_delim
to guess the delimiter of a text data set. See guess_col_types
to guess the column types of a text data set.
See hdd
, sub-.hdd
and cash-.hdd
for the extraction and manipulation of out of memory data. For importation of HDD data sets from text files: see txt2hdd
.
# Example with the iris data set
iris_path = tempfile()
fwrite(iris, iris_path)
# returns a readr columns set:
guess_col_types(iris_path)
#> cols(
#> Sepal.Length = col_double(),
#> Sepal.Width = col_double(),
#> Petal.Length = col_double(),
#> Petal.Width = col_double(),
#> Species = col_character()
#> )