R/export_nest.R
export_nest.RdExports list-columns containing data.frame or data.table objects from a
data.frame/data.table to txt or csv files, automatically
constructing a hierarchical directory structure from non-nested columns.
Exportable nested columns (those holding data.frame/data.table elements)
are distinguished from non-exportable custom-object columns (e.g. rsplit from
the rsample package); only the former are written to disk by default.
export_nest(
nest_dt,
group_cols = NULL,
nest_cols = NULL,
export_path = tempdir(),
file_type = "txt"
)A data.frame or data.table containing at least one
nested list-column. Must have one or more rows.
Optional character vector of column names used to build the
hierarchical output directory structure. When NULL (default), all non-nested
columns are used automatically.
Optional character vector of nested column names to export. When
NULL (default), all columns whose elements are data.frame/data.table
objects are exported automatically; custom-object list-columns are reported and skipped.
Specifying a non-data.frame column triggers a warning and that column is skipped.
Single character string specifying the root export directory.
Defaults to tempdir(). Created recursively if it does not exist.
Either "txt" (tab-separated, default) or "csv"
(comma-separated). Case-insensitive.
An invisible integer giving the total number of files successfully written.
Returns 0L when no exportable columns are found or all nested data are empty/NULL.
Nested column classification (mutually exclusive):
Exportable — every element inherits from data.frame or data.table.
Non-exportable — empty lists or elements of any other class
(e.g. rsplit, vfold_split). Reported to the console; never written.
Directory layout:
export_path / <group1_value> / <group2_value> / <nest_col_name>.<file_type>
Performance notes:
Row data is accessed via .subset2() (zero-copy column access) rather than
nest_dt[i], eliminating per-row data.table allocation in the hot loop.
All n output directory paths are pre-computed in a single vectorised
do.call(file.path, ...) call before the loop; only the k unique paths
are then passed to dir.create(), replacing n syscalls with k
(k <= n; often k << n when many rows share the same group).
The field separator and output filenames are computed once before the loop.
seq_len() is used instead of 1:n to avoid the 1:0 edge-case bug.
All list-column introspection uses vapply with explicit FUN.VALUE to
guarantee return types and prevent silent coercion.
Requires the data.table package for data manipulation and file I/O (fwrite).
# Example 1: Basic nested data export workflow
# Step 1: Create nested data structure
dt_nest <- w2l_nest(
data = iris, # Input iris dataset
cols2l = 1:2, # Columns to be nested
by = "Species" # Grouping variable
)
# Step 2: Export nested data to files
export_nest(
nest_dt = dt_nest, # Input nested data.table
nest_cols = "data", # Column containing nested data
group_cols = c("name", "Species") # Columns to create directory structure
)
#> [ export_nest ] Using grouping columns: name, Species
#> [ export_nest ] Export complete. 6 file(s) written to: /tmp/RtmpMDr3j5
# Returns the number of files created
# Creates directory structure: tempdir()/name/Species/data.txt
# Check exported files
list.files(
path = tempdir(), # Default export directory
pattern = "txt", # File type pattern to search
recursive = TRUE # Search in subdirectories
)
#> [1] "Sepal.Length/setosa/data.txt" "Sepal.Length/versicolor/data.txt"
#> [3] "Sepal.Length/virginica/data.txt" "Sepal.Width/setosa/data.txt"
#> [5] "Sepal.Width/versicolor/data.txt" "Sepal.Width/virginica/data.txt"
# Returns list of created files and their paths
# Clean up exported files
files <- list.files(
path = tempdir(), # Default export directory
pattern = "txt", # File type pattern to search
recursive = TRUE, # Search in subdirectories
full.names = TRUE # Return full file paths
)
file.remove(files) # Remove all exported files
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE