Intelligently exports nested data from data.frame
or data.table
objects with sophisticated
grouping capabilities and flexible handling of multiple nested column types. This function
distinguishes between exportable data.frame/data.table columns and non-exportable custom object
list columns (such as rsample cross-validation splits), processing only the appropriate types
by default.
export_nest(
nest_dt,
group_cols = NULL,
nest_cols = NULL,
export_path = tempdir(),
file_type = "txt"
)
A data.frame
or data.table
containing one or more nested columns.
Exportable nested columns contain data.frame
or data.table
objects. Non-exportable
columns contain custom objects such as rsplit
from the rsample package or other
list-based structures. The input cannot be empty.
Optional character vector specifying column names to use for hierarchical
grouping. These columns determine the directory structure for exported files.
If NULL
(default), automatically uses all non-nested columns as grouping variables.
Missing or invalid columns are handled gracefully with informative warnings.
Optional character vector specifying which nested columns to export.
If NULL
(default), automatically processes only columns containing data.frame
or
data.table
objects. Custom object list columns (e.g., rsplit
, vfold_split
from rsample)
are identified and reported but NOT exported. Specifying non-data.frame columns in nest_cols
triggers a warning and those columns are skipped.
Character string specifying the base directory for file export.
Defaults to tempdir()
. The function creates this directory recursively if it does not exist.
Character string indicating export format: "txt"
for tab-separated values
or "csv"
for comma-separated values. Defaults to "txt"
. Case-insensitive.
An invisible integer
representing the total number of files successfully exported.
Returns 0
if no exportable data.frame/data.table columns are found or if all nested
data are empty/NULL.
Nested Column Type Detection: The function automatically detects and categorizes nested columns into two types:
Exportable columns (Data.frame/data.table): Columns containing data.frame
or data.table
objects. These are the only columns exported to files by default.
Non-exportable columns (Custom objects): Columns containing other list-type objects
such as rsplit
(rsample cross-validation splits), vfold_split
, empty lists, or other
custom S3/S4 objects. These columns are identified and reported but cannot be exported
as txt/csv files.
Grouping Strategy:
When group_cols = NULL
, all non-nested columns automatically become grouping variables.
Grouping columns create a hierarchical directory structure where each unique combination of group values generates a separate subdirectory.
Files are organized as: export_path/group1_value/group2_value/nest_col.ext
If no valid group columns exist, files export to the root export_path
.
File Organization:
One file is generated per exportable nested column per row (e.g., row 1 with 2 data.frame columns generates 2 files).
Only data.frame/data.table nested columns are written; custom object columns are skipped.
Filenames follow the pattern: {nested_column_name}.{file_type}
(e.g., data.txt
, results.csv
).
Files are written using data.table::fwrite()
for efficient I/O.
Empty or NULL
nested data are silently skipped without interrupting the export process.
Error Handling:
Parameter validation occurs early, with informative error messages for invalid inputs.
Missing group columns trigger warnings but do not halt execution.
Custom object columns are identified and reported when nest_cols = NULL
,
allowing users to be aware of non-exportable data.
Invalid or non-data.frame nested columns in nest_cols
are skipped with warnings.
Individual row export failures generate warnings but continue processing remaining rows.
Data.table Requirement:
The data.table package is required. The function automatically checks for its availability
and converts input data to data.table
format if necessary.
The function does not modify the input nest_dt
; it is non-destructive.
Empty input data.frames trigger an error; use if (nrow(nest_dt) > 0)
to validate
input first.
Custom object columns detected when nest_cols = NULL
are reported as informational
messages; no error occurs.
Attempting to export custom object columns via nest_cols
will skip them with a warning.
All messages and warnings are printed to console; capture output programmatically
if needed via capture.output()
or similar functions.
File paths are constructed using file.path()
, ensuring cross-platform compatibility.
Requires the data.table
package for efficient data manipulation and I/O operations.
Custom object columns (e.g., rsplit
from rsample, cross-validation folds) cannot be
exported as txt/csv files because they are not standard data structures. These columns are
identified automatically and reported to the console. If you need to export rsample split
information, consider extracting the indices or data using rsample utility functions first.
Exporting structured data from tidymodels workflows that also contain cross-validation splits
Batch exporting multiple nested data.frame columns with automatic hierarchical organization
Creating organized file hierarchies based on grouping variables (e.g., by experiment, participant, or time period)
Integration with reproducible research workflows
fwrite
for details on file writing,
# Example 1: Basic nested data export workflow
# Step 1: Create nested data structure
dt_nest <- w2l_nest(
data = iris, # Input iris dataset
cols2l = 1:2, # Columns to be nested
by = "Species" # Grouping variable
)
# Step 2: Export nested data to files
export_nest(
nest_dt = dt_nest, # Input nested data.table
nest_cols = "data", # Column containing nested data
group_cols = c("name", "Species") # Columns to create directory structure
)
#> Total files exported: 6
#> [1] 6
# Returns the number of files created
# Creates directory structure: tempdir()/name/Species/data.txt
# Check exported files
list.files(
path = tempdir(), # Default export directory
pattern = "txt", # File type pattern to search
recursive = TRUE # Search in subdirectories
)
#> [1] "Sepal.Length/setosa/data.txt" "Sepal.Length/versicolor/data.txt"
#> [3] "Sepal.Length/virginica/data.txt" "Sepal.Width/setosa/data.txt"
#> [5] "Sepal.Width/versicolor/data.txt" "Sepal.Width/virginica/data.txt"
# Returns list of created files and their paths
# Clean up exported files
files <- list.files(
path = tempdir(), # Default export directory
pattern = "txt", # File type pattern to search
recursive = TRUE, # Search in subdirectories
full.names = TRUE # Return full file paths
)
file.remove(files) # Remove all exported files
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE