Finding number of duplicates in R
1
0
Entering edit mode
8.1 years ago
nkabo ▴ 80

Hello, I have a list of gene names and several features, each row represents a gene and its specialities. There are approximately 15000 rows and 11 columns. Some of the genes are encountered more than once (for example there are 4 TP53 data ) and I want to see how many times the gene name is duplicated and I want to use that value. Duplicated gene names are one under the other. As an example: Gene name: rs_id: aa change: CASP7 xx yy TP53 zz hh TP53 ff cc TP53 bb gg WNT aa dd WNT qq kk

I want to find the number of duplicate for each gene (4 for TP53 and 2 for WNT) and I also want to check the aa change for each duplicate. Is there a way to do it in R? Thanks in advance.

R • 51k views
ADD COMMENT
1
Entering edit mode

You can try library plyr, see my post on bioconductor support site:

https://support.bioconductor.org/p/71837/#71839

ADD REPLY
0
Entering edit mode

Thank you for your answer, I used the code below:

library(dplyr) newdf <- df %>% group_by(ID) %>% mutate(replicate=seq(n()))

However, I want to define one number only (for example, if a gene is repeated for 6 times, it should be like 6,6,6,6,6,6 not like 1,2,3,4,5,6). Could you suggest a way to do it?

ADD REPLY
1
Entering edit mode

Try count function from plyr.

?count
ADD REPLY
8
Entering edit mode
8.1 years ago
keith.hughitt ▴ 280

You can use the table function in R to get the count of each duplicated gene.

For example, if the gene IDs are stored in a column gene_id, you could do:

> dat <- data.frame(gene_id=sample(1:3, 20, replace=TRUE), other_col='foo')
> table(dat$gene_id)

1 2 3 
5 6 9 
> as.data.frame((table(dat$gene_id)))
  Var1 Freq
1    1    5
2    2    6
3    3    9

This gives you a data.frame of the number of duplicates for each ID.

Not sure what you mean by "check the aa change for each duplicate", but presumably you could just get a list of the unique gene IDs, and then use a for-loop to iterate over them, selecting all relevant rows, and performing some operation on each group of duplicates.

ADD COMMENT
0
Entering edit mode

Thank you for your answer I will also try that one.

ADD REPLY

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6