Question

Identifying Species Specific Differences In De Novo Transcriptome Data

0

Entering edit mode

10.3 years ago

pld 5.1k

I will soon have an assembled transcriptome and I have a good idea on what downstream analysis I would like to do. However, I am worried that basing the downstream functional annotation off of BLAST will cause losses in species specific differences. My interest is less in doing a basic assessment of how many genes in my model species land in the different reference organisms and more in highlighting the differences. In other words, chances are BLAST will show that there are more common genes than uncommon but I need to know the differences in the common genes.

First, I was wondering duplicating the analysis with assembled transcripts vs predicted ORFs would be worth it. Or, if I should go a step further and run the analysis using predicted peptide sequences.

Second, can anyone suggest ways to perform a more fine-grained functional/comparative analysis than just applying annotations based on BLAST results?

gene rna-seq transcriptome • 2.9k views

ADD COMMENT • link updated 10.3 years ago by Charles Warden 8.2k • written 10.3 years ago by pld 5.1k

score 1 · Answer 1 · 2013-12-31

I agree that comparing differences between de-novo assembly results (especially if you want something quantitative, like differential expression) is likely to be tricky. Actually, I think this is true regardless of whether you are working with one species or multiple species, but I agree that a multiple species comparison has additional complications. For example, a same species comparison still would have issues with defining 1:1 relationships between contigs/transcripts and having minor differences in alignment possibly over-estimate differences in BLAST annotations (in the same species, this could be to having a top hit for a homolog in speciesA versus a homolog in speciesB that actually have very similar E-scores but would be ignored if you only look at top hits).

I don't have a great solution for this problem, except recommending that you qualitatively compare the most highly expressed expressed genes (using an arbitrary cutoff like 30 or 40 genes). For example, I found that the CLC Bio contigs for adipose and muscle tissue showed logical differences for tissue-specific expression in this top genes (as well as for unranked positive controls). This actually worked better than both Trinity and Oases (which were specifically for RNA-Seq, unlike CLC Bio).

Beyond this, I can only recommend papers that may be potentially useful (mostly obtained from a Google search):

How to Compare 2 Differential expressed transcripts from 2 different de novo assembly?

http://www.biomedcentral.com/1471-2164/14/805

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082674

http://genomebiology.com/2013/14/2/R16

http://www.slideshare.net/AustralianBioinformatics/differential-expression-analysis-of-de-novo-assembled-transcriptomes