Selects the top (largest) or bottom (smallest) percentage of data based on specified traits. Positive percentages extract the largest values; negative percentages extract the smallest values.
top_perc(data, perc, trait, by = NULL, keep_data = FALSE)A data.frame or data.table.
A numeric vector strictly between -1 and 1 (excluding 0). Positive values (e.g., 0.05) select the top X% of largest values. Negative values (e.g., -0.1) select the bottom X% of smallest values.
A character vector of column names to analyse.
A character vector of column names to group by. Default is NULL.
Logical. If TRUE, returns a named list where each element
contains both stat (summary statistics) and data (the
subset rows). If FALSE (default), returns a single combined data.frame of
statistics for all perc values.
keep_data = FALSE: a data.frame with one row per
by / trait / perc combination, containing columns
n, min, max, mean, median, sd, se, cv, selection.
keep_data = TRUE: a named list (one element per perc
value) where each element is a list with $stat and $data.
# Example 1: Basic usage with single trait
# This example selects the top 10% of observations based on Petal.Width
# keep_data=TRUE returns both summary statistics and the filtered data
top_perc(iris,
perc = 0.1, # Select top 10%
trait = c("Petal.Width"), # Column to analyze
keep_data = TRUE) # Return both stats and filtered data
#> $perc_0.1
#> $perc_0.1$stat
#> variable N Min Max Mean Median SD SE CV
#> 1 Petal.Width 17 2.2 2.5 2.335294 2.3 0.09963167 0.02416423 0.04266344
#> selection
#> 1 top_10%
#>
#> $perc_0.1$data
#> Sepal.Length Sepal.Width Petal.Length Species variable value
#> 1 6.3 3.3 6.0 virginica Petal.Width 2.5
#> 2 6.5 3.0 5.8 virginica Petal.Width 2.2
#> 3 7.2 3.6 6.1 virginica Petal.Width 2.5
#> 4 5.8 2.8 5.1 virginica Petal.Width 2.4
#> 5 6.4 3.2 5.3 virginica Petal.Width 2.3
#> 6 7.7 3.8 6.7 virginica Petal.Width 2.2
#> 7 7.7 2.6 6.9 virginica Petal.Width 2.3
#> 8 6.9 3.2 5.7 virginica Petal.Width 2.3
#> 9 6.4 2.8 5.6 virginica Petal.Width 2.2
#> 10 7.7 3.0 6.1 virginica Petal.Width 2.3
#> 11 6.3 3.4 5.6 virginica Petal.Width 2.4
#> 12 6.7 3.1 5.6 virginica Petal.Width 2.4
#> 13 6.9 3.1 5.1 virginica Petal.Width 2.3
#> 14 6.8 3.2 5.9 virginica Petal.Width 2.3
#> 15 6.7 3.3 5.7 virginica Petal.Width 2.5
#> 16 6.7 3.0 5.2 virginica Petal.Width 2.3
#> 17 6.2 3.4 5.4 virginica Petal.Width 2.3
#>
#>
# Example 2: Using grouping with 'by' parameter
# This example performs the same analysis but separately for each Species
# Returns nested list with stats and filtered data for each group
top_perc(iris,
perc = 0.1, # Select top 10%
trait = c("Petal.Width"), # Column to analyze
by = "Species") # Group by Species
#> Species variable N Min Max Mean Median SD SE
#> 1 setosa Petal.Width 9 0.4 0.6 0.4333333 0.40 0.07071068 0.02357023
#> 2 versicolor Petal.Width 5 1.6 1.8 1.6600000 1.60 0.08944272 0.04000000
#> 3 virginica Petal.Width 6 2.4 2.5 2.4500000 2.45 0.05477226 0.02236068
#> CV selection
#> 1 0.16317849 top_10%
#> 2 0.05388116 top_10%
#> 3 0.02235602 top_10%