Question: Subset top 5% values of data frames stored in list
1
4 months ago by
paolo002140
paolo002140 wrote:

Hi This is probably a question for stack overflow but I am posting it here because there I am not getting much replies, apologies for this.

I have a 24 data frames with different number of rows and with various columns (for instance column SNPs ID and other columns with corresponding values for each SNPs) and I stored those data frames inside a list. I am doing various operations on the data frames at the same time. For instance if I want to order in decreasing manner a column in all the data frames inside the list I do:

``````myfiles_ordered<-lapply(myfiles, function(x) { x[ order(x\$column_name_to_order, decreasing=T),]})
``````

Now, after ordering that column I would like to take the top 5 % of the values of it. I was thinking I can subset all the data frames based on their specific row number multiplied by 0.05 and I wrote something like this:

``````myfiles_top5<-lapply(myfiles_ordered, function(x) {x[1:nrow(x)*5/100,]})
``````

However, it does not seems to work. Any help highly appreciated, thanks.

subset R • 193 views
modified 4 months ago by zx87547.1k • written 4 months ago by paolo002140

there I am not getting much replies

https://stackoverflow.com/questions/53724885/subset-multiple-data-frames-stored-inside-a-list

in any case...this was the link but maybe there I did not explain my problem so well...

3
4 months ago by
zx87547.1k
London
zx87547.1k wrote:

Try:

``````myfiles_top5 <- lapply(myfiles_ordered, function(x) { x[ 1:round(nrow(x)*5/100), ]})
``````

Because we are creating a sequence of 1 to n, then applying 5% for all of them.

``````1:nrow(mtcars) * 5/100
#  [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
# [17] 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60
``````

Not what we need...

Instead we need to get 5% then get the sequence, using parenthesis `()`:

``````1:(nrow(mtcars) * 5/100)
# [1] 1
``````

Again not ideal as, below both give 1:

``````1:1.2
# [1] 1
1:1.6
# [1] 1
``````

Whereas we might need `1:2` for `1:1.6`, so we use `round`:

``````1:round(1.2)
# [1] 1
1:round(1.6)
# [1] 1 2
``````

Update: We can do ordering and subsetting 5% in one go, e.g.:

``````# Using base
head(mtcars[ order(mtcars\$mpg, decreasing = TRUE), ], round(nrow(mtcars) * 5/100))
#                 mpg cyl disp hp drat    wt  qsec vs am gear carb
# Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
# Fiat 128       32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1

# Using dplyr
library(dplyr)
top_n(mtcars, round(nrow(mtcars) * 5/100), wt = mpg)
#    mpg cyl disp hp drat    wt  qsec vs am gear carb
# 1 32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1
# 2 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
``````