Extract Path Segments or Filenames from File Paths

get_path_info is a merged, upgraded replacement for get_path_segment and get_filename. It operates in two modes:

Mode A (when n is specified): Extract a specific path segment by position, supporting forward indexing, reverse indexing, and range extraction.
Mode B (when n = NULL): Extract the filename, with optional removal of the file extension and/or the directory prefix.

get_path_info(paths, n = NULL, rm_extension = TRUE, rm_path = TRUE)

Arguments

paths

A character vector of file system paths. Supports mixed separators (/ and \\) and Windows drive letters (e.g. C:).

n

A numeric segment index. Defaults to NULL (enters filename mode).

Positive integer: forward index from the path start; 1 = first segment.
Negative integer: reverse index from the path end; -1 = last segment (i.e. the filename segment).
Length-2 vector: extract a contiguous range, e.g. c(2, 4) or c(-3, -1).
0 is not allowed.

rm_extension

A logical(1) flag controlling extension removal. Defaults to TRUE.

In Mode B (n = NULL): always applied.
In Mode A: only applied when n == -1 (explicitly targeting the filename segment). Has no effect for intermediate directory segments (e.g. n = 2).

rm_path

A logical(1) flag controlling whether the directory prefix is stripped, keeping only the filename. Defaults to TRUE. Only applies in Mode B (n = NULL); ignored when n is specified.

Value

A character vector of the same length as paths:

Returns the extracted segment string when the segment exists.
Returns NA_character_ when the segment index exceeds the path depth, the input element is NA, or the path reduces to empty after normalisation (e.g. "C:/", "/").

Details

Path normalisation (internal, fully vectorised):

All backslashes and consecutive slashes are collapsed to a single /.
Windows drive letter prefixes (C:, D:, etc.) are stripped.
Leading and trailing / characters are removed.
Paths that are empty after the above steps (e.g. original inputs "C:/", "/", "") are coerced to NA_character_.

Extension-stripping behaviour (internal .strip_ext helper):

Input	Output	Notes
`"report.txt"`	`"report"`	Standard file — last extension removed
`"data.tar.gz"`	`"data.tar"`	Compound extension — only last level removed
`".bashrc"`	`".bashrc"`	Pure dot-file (no second dot) — unchanged
`".report.xlsx"`	`".report"`	Dot-file with extension — extension removed
`"no_ext"`	`"no_ext"`	No extension — returned as-is
`"file."`	`"file."`	Trailing isolated dot — returned as-is

NA safety: strsplit(NA_character_, ...) returns list(NA) with length 1, not character(0). Consequently, every vapply callback guards against NA paths with an explicit anyNA(x) check rather than length(x) == 0.

Examples

paths <- c("C:/Users/foo/Documents/report.xlsx",
           "/home/user/.bashrc",
           "relative/path/to/data.csv",
           ".hidden.tar.gz",
           NA_character_)

# Mode B: filename only, extension stripped (default)
get_path_info(paths)
#> [1] "report"      ".bashrc"     "data"        ".hidden.tar" NA           

# Mode B: filename only, extension preserved
get_path_info(paths, rm_extension = FALSE)
#> [1] "report.xlsx"    ".bashrc"        "data.csv"       ".hidden.tar.gz"
#> [5] NA              

# Mode B: full normalised path, extension stripped
get_path_info(paths, rm_path = FALSE)
#> [1] "Users/foo/Documents/report" "home/user/.bashrc"         
#> [3] "relative/path/to/data"      ".hidden.tar"               
#> [5] NA                          

# Mode A: extract the 2nd path segment
get_path_info(paths, n = 2)
#> [1] "foo"  "user" "path" NA     NA    

# Mode A: extract the last segment with extension stripped (n = -1 linkage)
get_path_info(paths, n = -1, rm_extension = TRUE)
#> [1] "report"      ".bashrc"     "data"        ".hidden.tar" NA           

# Mode A: range extraction
get_path_info(paths, n = c(2, 3))
#> [1] "foo/Documents" "user/.bashrc"  "path/to"       NA             
#> [5] NA

Extract Path Segments or Filenames from File Paths

Arguments

Value

Details

See also

Examples