A high-performance function for importing data from one or multiple Excel files into data.table format, with fine-grained control over source tracking columns, sheet selection, and row skipping.

Performance characteristics (v3 baseline, unchanged here):

  • excel_sheets() called exactly once per file (cached).

  • setDT() converts tibbles in-place — zero vector copies.

  • Tracking columns injected via := on small per-sheet tables before the single final rbindlist.

  • data.table OpenMP thread pool optionally widened and always restored on exit.

import_xlsx(
  file,
  rbind = TRUE,
  sheet = NULL,
  skip = 0L,
  show_excel_name = TRUE,
  show_sheet_name = TRUE,
  verbose = FALSE,
  dt_threads = data.table::getDTthreads(),
  ...
)

Arguments

file

Non-empty character vector of paths to existing .xlsx / .xls files.

rbind

logical(1). TRUE (default) binds all sheets into a single data.table. FALSE returns a flat named list keyed as "<filename>_<sheetname>".

sheet

Positive integer vector or NULL (default). NULL imports every sheet. Indices must be valid across all supplied files.

skip

Non-negative integer(1). Number of rows to skip before reading the header. Forwarded directly to read_excel. Default 0L.

show_excel_name

logical(1). When TRUE (default) and rbind = TRUE, prepends an excel_name column recording the source filename (extension stripped). Silently ignored when rbind = FALSE (provenance is already encoded in list-element names).

show_sheet_name

logical(1). When TRUE (default) and rbind = TRUE, prepends a sheet_name column recording the source sheet. Silently ignored when rbind = FALSE.

verbose

logical(1). Emit a message() for every empty sheet encountered. Default FALSE.

dt_threads

integer(1). OpenMP threads for data.table operations (rbindlist, :=, set*). Defaults to the current getDTthreads() value. Always restored on exit.

...

Additional arguments forwarded to read_excel (e.g. col_types, na, trim_ws). Do not pass sheet or skip here; use the dedicated parameters above.

Value

rbind = TRUE

A data.table. Tracking columns excel_name and/or sheet_name are prepended when their respective show_* flags are TRUE.

rbind = FALSE

A named list of data.tables, each element named "<filename>_<sheetname>". The list carries a "source_files" attribute with the original file paths.

Examples

# Example: Excel file import demonstrations

# Setup test files
xlsx_files <- mintyr_example(
  mintyr_examples("xlsx_test")    # Get example Excel files
)

# Example 1: Import and combine all sheets from all files
import_xlsx(
  xlsx_files,                     # Input Excel file paths
  rbind = TRUE                    # Combine all sheets into one data.table
)
#>     excel_name sheet_name  col1   col2   col3
#>         <char>     <char> <num> <char> <lgcl>
#>  1: xlsx_test1     Sheet1     4      d  FALSE
#>  2: xlsx_test1     Sheet1     5      f   TRUE
#>  3: xlsx_test1     Sheet1     6      e   TRUE
#>  4: xlsx_test1     Sheet2     1      a   TRUE
#>  5: xlsx_test1     Sheet2     2      b  FALSE
#>  6: xlsx_test1     Sheet2     3      c   TRUE
#>  7: xlsx_test2     Sheet1    15      o  FALSE
#>  8: xlsx_test2     Sheet1    16      p   TRUE
#>  9: xlsx_test2     Sheet1    17      q  FALSE
#> 10: xlsx_test2          a     7      g  FALSE
#> 11: xlsx_test2          a     9      h   TRUE
#> 12: xlsx_test2          a     8      i  FALSE
#> 13: xlsx_test2          b    10      J  FALSE
#> 14: xlsx_test2          b    11      K   TRUE
#> 15: xlsx_test2          b    12      L  FALSE

# Example 2: Import specific sheets separately
import_xlsx(
  xlsx_files,                     # Input Excel file paths
  rbind = FALSE,                  # Keep sheets as separate data.tables
  sheet = 2                       # Only import first sheet
)
#> $xlsx_test1_Sheet2
#>     col1   col2   col3
#>    <num> <char> <lgcl>
#> 1:     1      a   TRUE
#> 2:     2      b  FALSE
#> 3:     3      c   TRUE
#> 
#> $xlsx_test2_a
#>     col1   col2   col3
#>    <num> <char> <lgcl>
#> 1:     7      g  FALSE
#> 2:     9      h   TRUE
#> 3:     8      i  FALSE
#> 
#> attr(,"source_files")
#> [1] "/home/runner/work/_temp/Library/mintyr/extdata/xlsx_test1.xlsx"
#> [2] "/home/runner/work/_temp/Library/mintyr/extdata/xlsx_test2.xlsx"