Filter data frame with dplyr
1
2
Entering edit mode
3.7 years ago
luca ▴ 70

Hi there, I would like to filter my dataframe which is made of 5 columns, of which column1 contains gene names, column 2 contains Fold Changes (expressed as logFC), column 3 contains the FDR-adjusted p-value and the other two columns contain other things.

The thing is that my genes can be duplicated in the data.frame, so I would like to remove duplicated values. To remove duplicated values I am sorting by FDR to keep the gene (among the duplicates) that has the lowest FDR, by doing this: convertedata2 = convertedata %>% group_by(Geneid) %>% filter(FDR == min(FDR))

The problem is that some genes can have the same minimum FDR (e.g. if all genes have FDR=1), so they are not filtered.... To remove them, I would like to filter based on the logFC, and I would like to keep the gene that has the highest absolute(logFC). So I thought to change the previous command into this: convertedata2 = convertedata %>% group_by(Geneid) %>% filter(FDR == min(FDR)) %>% filter(logFC == max(abs(logFC))) but the problem is that it doesn't work... I suspect it has to do with the abs function, but I am not sure why and what is going on. Any help is much appreciated!

Thanks Luca

dplyr R filter • 1.7k views
ADD COMMENT
2
Entering edit mode
3.7 years ago

Here is some example data.

df <- data.frame(Geneid=c("A","A","B","C"), FDR=c(0.01,0.01,0.25,0.025), logFC=rnorm(4,0,3))

> df
  Geneid   FDR     logFC
1      A 0.010  1.970233
2      A 0.010 -2.703701
3      B 0.250  3.957811
4      C 0.025 -2.641965

Here is how you would do the filtering (you were really close).

library("dplyr")

df <- df %>%
  group_by(Geneid) %>%
  filter(FDR == min(FDR) & abs(logFC) == max(abs(logFC))) %>%
  ungroup

> df
# A tibble: 3 x 3
# Groups:   Geneid [3]
  Geneid   FDR logFC
  <chr>  <dbl> <dbl>
1 A      0.01  -2.70
2 B      0.25   3.96
3 C      0.025 -2.64
ADD COMMENT
0
Entering edit mode

Thanks rpolicastro! You are always super helpful!

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6