How to analyze the differential expression of a gene if it has several ensembl id
0
0
Entering edit mode
2.0 years ago
fuad.fsu • 0

Hie, I have obtained the differential expressions of genes (from RNA-seq data) in an excel file. I can see one gene has several ensembl ids and I guess the gene is from the haplotype region. But I am now confused how to report the DE of this gene as it has different measures in different ensembl ids. Can anybody please suggest me something? Thanks in advance

data RNA-seq DE analysis • 977 views
ADD COMMENT
1
Entering edit mode

in an excel file

Use R. Excel will mangle gene symbols.

one gene has several ensembl IDs

You're referring to the fact that one HGNC symbol can map to multiple ENSG IDs. Many of these cases can be resolved by restricting yourself to chr1-22,X,Y and ignoring ENSGs in patch contigs. Paralogs and some repeated genes (such as miRNAs) will be a problem.

ADD REPLY
0
Entering edit mode

what I meant is this

This is what I meant. How could I report the DE of the gene gsdf now as it has multiple ids

ADD REPLY
1
Entering edit mode

It looks like you're working with goldfish - correct me if I'm wrong. I assumed you were working with human genes as there's no mention of organism in your posts.

My suggestions are valid for human genes, but I cannot help you with other organisms.

In any case, you can report DE per ENSCARG ID until you figure out a way to pick the ENSCARG you want per gene_name.

ADD REPLY
0
Entering edit mode

Thank you very much

ADD REPLY
1
Entering edit mode

Those appear to be 4 copies gsdf gene that are close to each other. How did you do your quantitation? Unless you had used something like salmon or kallisto those would be hard to align reads to and would result in multimap issues. If you did use one of the mapping programs then you may need to report the copies of the gene independently.

ADD REPLY
0
Entering edit mode

yes, I used Salmon. Thank you for the answer. But wont it be difficult from the interpretation point of view? Wouldnt it be difficult for the readers if I report several copies of a gene!

ADD REPLY
0
Entering edit mode

salmon is used with the transcriptome data (not genome). How come you have gene names rather than the ENSCART00000137773.1 which would be the transcript names.

For example this gene has two transcripts: http://www.ensembl.org/Carassius_auratus/Gene/Splice?db=core;g=ENSCARG00000064031;r=QPKE01005937.1:38169-40515

There are 3 aditional paralogs (with their own transcripts): http://www.ensembl.org/Carassius_auratus/Gene/Compara_Paralog?db=core;g=ENSCARG00000064031;r=QPKE01005937.1:38169-40515

ADD REPLY

Login before adding your answer.

Traffic: 1589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6