This function is a facility to guess the column types of a text document. It returns columns formatted a la readr.

guess_col_types(dt_or_path, col_names, n = 10000)

Arguments

dt_or_path

Either a data frame or a path.

col_names

Optional: the vector of names of the columns, if not contained in the file. Must match the number of columns in the file.

n

Number of observations used to make the guess. By default, n = 100000.

Value

It returns a cols object a la readr.

Details

The guessing of the column types is based on the 10,000 (set with argument n) first rows.

Note that by default, columns that are found to be integers are imported as double (in want of integer64 type in readr). Note that for large data sets, sometimes integer-like identifiers can be larger than 16 digits: in these case you must import them as character not to lose information.

See also

See peek to have a convenient look at the first lines of a text file. See guess_delim to guess the delimiter of a text data set. See guess_col_types to guess the column types of a text data set.

See hdd, sub-.hdd and cash-.hdd for the extraction and manipulation of out of memory data. For importation of HDD data sets from text files: see txt2hdd.

Author

Laurent Berge

Examples


# Example with the iris data set
iris_path = tempfile()
fwrite(iris, iris_path)

# returns a readr columns set:
guess_col_types(iris_path)
#> cols(
#>   Sepal.Length = col_double(),
#>   Sepal.Width = col_double(),
#>   Petal.Length = col_double(),
#>   Petal.Width = col_double(),
#>   Species = col_character()
#> )