Selects the top (largest) or bottom (smallest) percentage of data based on specified traits. Positive percentages extract the largest values; negative percentages extract the smallest values.

top_perc(data, perc, trait, by = NULL, keep_data = FALSE)

Arguments

data

A data.frame or data.table.

perc

A numeric vector strictly between -1 and 1 (excluding 0). Positive values (e.g., 0.05) select the top X% of largest values. Negative values (e.g., -0.1) select the bottom X% of smallest values.

trait

A character vector of column names to analyse.

by

A character vector of column names to group by. Default is NULL.

keep_data

Logical. If TRUE, returns a named list where each element contains both stat (summary statistics) and data (the subset rows). If FALSE (default), returns a single combined data.frame of statistics for all perc values.

Value

  • keep_data = FALSE: a data.frame with one row per by / trait / perc combination, containing columns n, min, max, mean, median, sd, se, cv, selection.

  • keep_data = TRUE: a named list (one element per perc value) where each element is a list with $stat and $data.

Examples

# Example 1: Basic usage with single trait
# This example selects the top 10% of observations based on Petal.Width
# keep_data=TRUE returns both summary statistics and the filtered data
top_perc(iris, 
         perc = 0.1,                # Select top 10%
         trait = c("Petal.Width"),  # Column to analyze
         keep_data = TRUE)          # Return both stats and filtered data
#> $perc_0.1
#> $perc_0.1$stat
#>      variable  N Min Max     Mean Median         SD         SE         CV
#> 1 Petal.Width 17 2.2 2.5 2.335294    2.3 0.09963167 0.02416423 0.04266344
#>   selection
#> 1   top_10%
#> 
#> $perc_0.1$data
#>    Sepal.Length Sepal.Width Petal.Length   Species    variable value
#> 1           6.3         3.3          6.0 virginica Petal.Width   2.5
#> 2           6.5         3.0          5.8 virginica Petal.Width   2.2
#> 3           7.2         3.6          6.1 virginica Petal.Width   2.5
#> 4           5.8         2.8          5.1 virginica Petal.Width   2.4
#> 5           6.4         3.2          5.3 virginica Petal.Width   2.3
#> 6           7.7         3.8          6.7 virginica Petal.Width   2.2
#> 7           7.7         2.6          6.9 virginica Petal.Width   2.3
#> 8           6.9         3.2          5.7 virginica Petal.Width   2.3
#> 9           6.4         2.8          5.6 virginica Petal.Width   2.2
#> 10          7.7         3.0          6.1 virginica Petal.Width   2.3
#> 11          6.3         3.4          5.6 virginica Petal.Width   2.4
#> 12          6.7         3.1          5.6 virginica Petal.Width   2.4
#> 13          6.9         3.1          5.1 virginica Petal.Width   2.3
#> 14          6.8         3.2          5.9 virginica Petal.Width   2.3
#> 15          6.7         3.3          5.7 virginica Petal.Width   2.5
#> 16          6.7         3.0          5.2 virginica Petal.Width   2.3
#> 17          6.2         3.4          5.4 virginica Petal.Width   2.3
#> 
#> 

# Example 2: Using grouping with 'by' parameter
# This example performs the same analysis but separately for each Species
# Returns nested list with stats and filtered data for each group
top_perc(iris, 
         perc = 0.1,                # Select top 10%
         trait = c("Petal.Width"),  # Column to analyze
         by = "Species")            # Group by Species
#>      Species    variable N Min Max      Mean Median         SD         SE
#> 1     setosa Petal.Width 9 0.4 0.6 0.4333333   0.40 0.07071068 0.02357023
#> 2 versicolor Petal.Width 5 1.6 1.8 1.6600000   1.60 0.08944272 0.04000000
#> 3  virginica Petal.Width 6 2.4 2.5 2.4500000   2.45 0.05477226 0.02236068
#>           CV selection
#> 1 0.16317849   top_10%
#> 2 0.05388116   top_10%
#> 3 0.02235602   top_10%