A high-performance function for importing data from one or multiple Excel
files into data.table format, with fine-grained control over source
tracking columns, sheet selection, and row skipping.
Performance characteristics (v3 baseline, unchanged here):
excel_sheets() called exactly once per file (cached).
setDT() converts tibbles in-place — zero vector copies.
Tracking columns injected via := on small per-sheet tables
before the single final rbindlist.
data.table OpenMP thread pool optionally widened and always
restored on exit.
import_xlsx(
file,
rbind = TRUE,
sheet = NULL,
skip = 0L,
show_excel_name = TRUE,
show_sheet_name = TRUE,
verbose = FALSE,
dt_threads = data.table::getDTthreads(),
...
)Non-empty character vector of paths to existing
.xlsx / .xls files.
logical(1). TRUE (default) binds all sheets
into a single data.table. FALSE returns a
flat named list keyed as "<filename>_<sheetname>".
Positive integer vector or NULL (default).
NULL imports every sheet. Indices must be valid
across all supplied files.
Non-negative integer(1). Number of rows to skip
before reading the header. Forwarded directly to
read_excel. Default 0L.
logical(1). When TRUE (default) and
rbind = TRUE, prepends an excel_name column
recording the source filename (extension stripped).
Silently ignored when rbind = FALSE (provenance is
already encoded in list-element names).
logical(1). When TRUE (default) and
rbind = TRUE, prepends a sheet_name column
recording the source sheet. Silently ignored when
rbind = FALSE.
logical(1). Emit a message() for every
empty sheet encountered. Default FALSE.
integer(1). OpenMP threads for data.table
operations (rbindlist, :=, set*).
Defaults to the current getDTthreads() value.
Always restored on exit.
Additional arguments forwarded to
read_excel (e.g. col_types,
na, trim_ws). Do not pass
sheet or skip here; use the dedicated
parameters above.
rbind = TRUEA data.table. Tracking columns
excel_name and/or sheet_name are prepended when their
respective show_* flags are TRUE.
rbind = FALSEA named list of data.tables,
each element named "<filename>_<sheetname>". The list carries a
"source_files" attribute with the original file paths.
# Example: Excel file import demonstrations
# Setup test files
xlsx_files <- mintyr_example(
mintyr_examples("xlsx_test") # Get example Excel files
)
# Example 1: Import and combine all sheets from all files
import_xlsx(
xlsx_files, # Input Excel file paths
rbind = TRUE # Combine all sheets into one data.table
)
#> excel_name sheet_name col1 col2 col3
#> <char> <char> <num> <char> <lgcl>
#> 1: xlsx_test1 Sheet1 4 d FALSE
#> 2: xlsx_test1 Sheet1 5 f TRUE
#> 3: xlsx_test1 Sheet1 6 e TRUE
#> 4: xlsx_test1 Sheet2 1 a TRUE
#> 5: xlsx_test1 Sheet2 2 b FALSE
#> 6: xlsx_test1 Sheet2 3 c TRUE
#> 7: xlsx_test2 Sheet1 15 o FALSE
#> 8: xlsx_test2 Sheet1 16 p TRUE
#> 9: xlsx_test2 Sheet1 17 q FALSE
#> 10: xlsx_test2 a 7 g FALSE
#> 11: xlsx_test2 a 9 h TRUE
#> 12: xlsx_test2 a 8 i FALSE
#> 13: xlsx_test2 b 10 J FALSE
#> 14: xlsx_test2 b 11 K TRUE
#> 15: xlsx_test2 b 12 L FALSE
# Example 2: Import specific sheets separately
import_xlsx(
xlsx_files, # Input Excel file paths
rbind = FALSE, # Keep sheets as separate data.tables
sheet = 2 # Only import first sheet
)
#> $xlsx_test1_Sheet2
#> col1 col2 col3
#> <num> <char> <lgcl>
#> 1: 1 a TRUE
#> 2: 2 b FALSE
#> 3: 3 c TRUE
#>
#> $xlsx_test2_a
#> col1 col2 col3
#> <num> <char> <lgcl>
#> 1: 7 g FALSE
#> 2: 9 h TRUE
#> 3: 8 i FALSE
#>
#> attr(,"source_files")
#> [1] "/home/runner/work/_temp/Library/mintyr/extdata/xlsx_test1.xlsx"
#> [2] "/home/runner/work/_temp/Library/mintyr/extdata/xlsx_test2.xlsx"