Export Nested Data Structures with Hierarchical Organization

Intelligently exports nested data from data.frame or data.table objects with sophisticated grouping capabilities and flexible handling of multiple nested column types. This function distinguishes between exportable data.frame/data.table columns and non-exportable custom object list columns (such as rsample cross-validation splits), processing only the appropriate types by default.

export_nest(
  nest_dt,
  group_cols = NULL,
  nest_cols = NULL,
  export_path = tempdir(),
  file_type = "txt"
)

Arguments

nest_dt: A data.frame or data.table containing one or more nested columns. Exportable nested columns contain data.frame or data.table objects. Non-exportable columns contain custom objects such as rsplit from the rsample package or other list-based structures. The input cannot be empty.
group_cols: Optional character vector specifying column names to use for hierarchical grouping. These columns determine the directory structure for exported files. If NULL (default), automatically uses all non-nested columns as grouping variables. Missing or invalid columns are handled gracefully with informative warnings.
nest_cols: Optional character vector specifying which nested columns to export. If NULL (default), automatically processes only columns containing data.frame or data.table objects. Custom object list columns (e.g., rsplit, vfold_split from rsample) are identified and reported but NOT exported. Specifying non-data.frame columns in nest_cols triggers a warning and those columns are skipped.
export_path: Character string specifying the base directory for file export. Defaults to tempdir(). The function creates this directory recursively if it does not exist.
file_type: Character string indicating export format: "txt" for tab-separated values or "csv" for comma-separated values. Defaults to "txt". Case-insensitive.

Value

An invisible integer representing the total number of files successfully exported. Returns 0 if no exportable data.frame/data.table columns are found or if all nested data are empty/NULL.

Details

Nested Column Type Detection: The function automatically detects and categorizes nested columns into two types:

Exportable columns (Data.frame/data.table): Columns containing data.frame or data.table objects. These are the only columns exported to files by default.
Non-exportable columns (Custom objects): Columns containing other list-type objects such as rsplit (rsample cross-validation splits), vfold_split, empty lists, or other custom S3/S4 objects. These columns are identified and reported but cannot be exported as txt/csv files.

Grouping Strategy:

When group_cols = NULL, all non-nested columns automatically become grouping variables.
Grouping columns create a hierarchical directory structure where each unique combination of group values generates a separate subdirectory.
Files are organized as: export_path/group1_value/group2_value/nest_col.ext
If no valid group columns exist, files export to the root export_path.

File Organization:

One file is generated per exportable nested column per row (e.g., row 1 with 2 data.frame columns generates 2 files).
Only data.frame/data.table nested columns are written; custom object columns are skipped.
Filenames follow the pattern: {nested_column_name}.{file_type} (e.g., data.txt, results.csv).
Files are written using data.table::fwrite() for efficient I/O.
Empty or NULL nested data are silently skipped without interrupting the export process.

Error Handling:

Parameter validation occurs early, with informative error messages for invalid inputs.
Missing group columns trigger warnings but do not halt execution.
Custom object columns are identified and reported when nest_cols = NULL, allowing users to be aware of non-exportable data.
Invalid or non-data.frame nested columns in nest_cols are skipped with warnings.
Individual row export failures generate warnings but continue processing remaining rows.

Data.table Requirement: The data.table package is required. The function automatically checks for its availability and converts input data to data.table format if necessary.

Note

The function does not modify the input nest_dt; it is non-destructive.
Empty input data.frames trigger an error; use if (nrow(nest_dt) > 0) to validate input first.
Custom object columns detected when nest_cols = NULL are reported as informational messages; no error occurs.
Attempting to export custom object columns via nest_cols will skip them with a warning.
All messages and warnings are printed to console; capture output programmatically if needed via capture.output() or similar functions.
File paths are constructed using file.path(), ensuring cross-platform compatibility.

Dependencies

Requires the data.table package for efficient data manipulation and I/O operations.

Limitations

Custom object columns (e.g., rsplit from rsample, cross-validation folds) cannot be exported as txt/csv files because they are not standard data structures. These columns are identified automatically and reported to the console. If you need to export rsample split information, consider extracting the indices or data using rsample utility functions first.

Use Cases

Exporting structured data from tidymodels workflows that also contain cross-validation splits
Batch exporting multiple nested data.frame columns with automatic hierarchical organization
Creating organized file hierarchies based on grouping variables (e.g., by experiment, participant, or time period)
Integration with reproducible research workflows

Examples

# Example 1: Basic nested data export workflow
# Step 1: Create nested data structure
dt_nest <- w2l_nest(
  data = iris,              # Input iris dataset
  cols2l = 1:2,             # Columns to be nested
  by = "Species"            # Grouping variable
)

# Step 2: Export nested data to files
export_nest(
  nest_dt = dt_nest,        # Input nested data.table
  nest_cols = "data",       # Column containing nested data
  group_cols = c("name", "Species")  # Columns to create directory structure
)
#> Total files exported: 6
#> [1] 6
# Returns the number of files created
# Creates directory structure: tempdir()/name/Species/data.txt

# Check exported files
list.files(
  path = tempdir(),         # Default export directory
  pattern = "txt",          # File type pattern to search
  recursive = TRUE          # Search in subdirectories
)
#> [1] "Sepal.Length/setosa/data.txt"     "Sepal.Length/versicolor/data.txt"
#> [3] "Sepal.Length/virginica/data.txt"  "Sepal.Width/setosa/data.txt"     
#> [5] "Sepal.Width/versicolor/data.txt"  "Sepal.Width/virginica/data.txt"  
# Returns list of created files and their paths

# Clean up exported files
files <- list.files(
  path = tempdir(),         # Default export directory
  pattern = "txt",          # File type pattern to search
  recursive = TRUE,         # Search in subdirectories
  full.names = TRUE         # Return full file paths
)
file.remove(files)          # Remove all exported files
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE