What to do when having differenet versions of ensembl IDs?
1
0
Entering edit mode
3.1 years ago

This is a general question, but I want to know the best practice here. Sometimes, when I have an RNA-seq data, the row names is represented as ensembl IDs which is not very meaningful to me. When I try to map the rownames to gene symbols, I got and error that rownames cannot contain duplicated enteries. So, many genes has different versions (Which I don't totally understand how aligners uses multiple versions)

So, what should I do about that? I think If I kept only one of them or the most varying before normalization and clustering check, I would be biasing the analysis as I ignored few counts!

What should I do?

R Bioconductor RNA-Seq • 576 views
ADD COMMENT
0
Entering edit mode
3.1 years ago

I keep all gene IDs regardless of whether they map to multiple gene names. If I want to display gene names for some visualization or data presentation (such as a list of DEGs) I'll merge the gene names into the matching gene IDs.

ADD COMMENT
0
Entering edit mode

that's logical so far, but what if one gene of interrest is not matching another version. like one over expressed and other is not or also overexpressed but with different values?

ADD REPLY
0
Entering edit mode

You can report the gene IDs alongside the gene names.

ADD REPLY
0
Entering edit mode

Pretty neat idea. Thank you for sharing

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6