Question: Subset top 5% values of data frames stored in list
1
gravatar for paolo002
4 months ago by
paolo002140
paolo002140 wrote:

Hi This is probably a question for stack overflow but I am posting it here because there I am not getting much replies, apologies for this.

I have a 24 data frames with different number of rows and with various columns (for instance column SNPs ID and other columns with corresponding values for each SNPs) and I stored those data frames inside a list. I am doing various operations on the data frames at the same time. For instance if I want to order in decreasing manner a column in all the data frames inside the list I do:

myfiles_ordered<-lapply(myfiles, function(x) { x[ order(x$column_name_to_order, decreasing=T),]})

Now, after ordering that column I would like to take the top 5 % of the values of it. I was thinking I can subset all the data frames based on their specific row number multiplied by 0.05 and I wrote something like this:

myfiles_top5<-lapply(myfiles_ordered, function(x) {x[1:nrow(x)*5/100,]})

However, it does not seems to work. Any help highly appreciated, thanks.

subset R • 193 views
ADD COMMENTlink modified 4 months ago by zx87547.1k • written 4 months ago by paolo002140

there I am not getting much replies

Could you add the link to StackOverflow post?

ADD REPLYlink written 4 months ago by zx87547.1k

https://stackoverflow.com/questions/53724885/subset-multiple-data-frames-stored-inside-a-list

in any case...this was the link but maybe there I did not explain my problem so well...

ADD REPLYlink written 4 months ago by paolo002140
3
gravatar for zx8754
4 months ago by
zx87547.1k
London
zx87547.1k wrote:

Try:

myfiles_top5 <- lapply(myfiles_ordered, function(x) { x[ 1:round(nrow(x)*5/100), ]})

Because we are creating a sequence of 1 to n, then applying 5% for all of them.

1:nrow(mtcars) * 5/100
#  [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
# [17] 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60

Not what we need...

Instead we need to get 5% then get the sequence, using parenthesis ():

1:(nrow(mtcars) * 5/100)
# [1] 1

Again not ideal as, below both give 1:

1:1.2
# [1] 1
1:1.6
# [1] 1

Whereas we might need 1:2 for 1:1.6, so we use round:

1:round(1.2)
# [1] 1
1:round(1.6)
# [1] 1 2

Update: We can do ordering and subsetting 5% in one go, e.g.:

# Using base
head(mtcars[ order(mtcars$mpg, decreasing = TRUE), ], round(nrow(mtcars) * 5/100))
#                 mpg cyl disp hp drat    wt  qsec vs am gear carb
# Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
# Fiat 128       32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1

# Using dplyr
library(dplyr)
top_n(mtcars, round(nrow(mtcars) * 5/100), wt = mpg)
#    mpg cyl disp hp drat    wt  qsec vs am gear carb
# 1 32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1
# 2 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
ADD COMMENTlink modified 4 months ago • written 4 months ago by zx87547.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1183 users visited in the last hour