Question: Duplicated genes symbol from supplementary information paper
0
gravatar for delacroixed
11 weeks ago by
delacroixed10
delacroixed10 wrote:

Hello everyone,

Recently, I downloaded this table:

https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1001154#s5

But I realized there are 115/18,931 gene symbols which are duplicated (or repeated several times some of them).

I was wondering what is the best way to proceed.

Thank you in advance.

Francisco Requena

haploinsufficiency • 119 views
ADD COMMENTlink written 11 weeks ago by delacroixed10

How are the chromosomal locations of those repetitive gene symbols (same or different location) ?

And for your statement "I was wondering what is the best way to proceed", it is impossible to answer unless you explain what you like to do with those genes.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by EagleEye6.5k

Hello! Thank you for your fast reply. I have checked their locations and they are distributed across the genome. This score (along with others) will be displayed in a software tool for clinician use. Since there are genes duplicated, if the user searches for any of those genes, it will be displayed two rows (with the same information but the HI score different)

ADD REPLYlink written 11 weeks ago by delacroixed10

I think that EagleEye was asking if, given any duplicate pair of gene symbols, do they have the same genomic co-ordinates? Also, can you provide an example of such a gene symbol pair?

ADD REPLYlink written 11 weeks ago by Kevin Blighe51k

First, you didn't link to a table but to the list of supplementary material of the paper. Second, there are two tables there and both have fewer than 18000 lines (so presumably fewer gene symbols) and don't appear to have duplicated gene symbols. Could it be that you're talking about another data set or paper?

ADD REPLYlink written 11 weeks ago by Jean-Karim Heriche21k
1
gravatar for delacroixed
11 weeks ago by
delacroixed10
delacroixed10 wrote:

I have checked again the raw data and I noticed the error for the duplicated symbols. Symbols which have a dash symbol (-) and a number next to it (e.g. KRTAP13-1, KRTAP13-2, KRTAP13-3...) are trimmed by the end (e.g. KRTAP13) in my script. I have solved the problem. There is not any problem with the dataset. Thank you!

ADD COMMENTlink written 11 weeks ago by delacroixed10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour