Question

Combining de novo and genome guided transcriptome assembly for expression analysis?

2

Entering edit mode

8.1 years ago

standonn ▴ 20

Dear all,

I am about to do some RNA-seq analysis to find differently expressed (DE) genes between different conditions. I am studying a nematode which is relatively unknown but for which I have a genome assembled (not published yet).

Now because I have a genome I could use the tuxedo pipeline (tophat / Cufflinks / Cuffdiff) to find the DE genes. But my concern is that the genome is newly assembled and therefore not very polished. I also read this post (Is There Any Reason To Do De Novo Transcript Assembly If A Reference Is Available?) which made me think that perhaps using both the de novo and the genome-guided strategies was better.

I then found that is it possible to combine de novo transcriptome assemblies with genome-guided assemblies (http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-2277-7 , http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0091776) by using the tr2aacds pipeline of EvidentialGene (http://arthropods.eugenes.org/genes2/about/EvidentialGene_trassembly_pipe.html).

If I'm understanding correctly, I would then have (I hope) a very good transcriptome to which I could align my reads (both conditions separately) and perform the differential expression analysis.

My question is: is it a good ideia / justified to do the RNA-seq analysis using a transcriptome assembled de novo as well as genome-guided? Or is the tuxedo pipeline better?

Advices very much appreciated!

Best Wishes, Sophie

RNA-Seq genome-guided de novo expression analysis • 4.0k views

ADD COMMENT • link updated 5.5 years ago by Biostar 20 • written 8.1 years ago by standonn ▴ 20

score 0 · Answer 1 · 2016-03-24

You will reconstruct genes more accurately and completely from several multi-kmer de-novo assemblies of RNA, with different assemblers, including cufflinks if you want, than from genome-based gene construction or modelling. This is according to my now fairly extensive results with various animals and plants, and is a major reason I want folks to try the http://arthropods.eugenes.org/EvidentialGene/ methods (ie improve our published gene sets). This is presuming you have enough high quality paired-end Illumina RNA-seq.

See for recent mosquito gene sets, EvidentialGene versus MAKER+Trinity comparisons, http://arthropods.eugenes.org/EvidentialGene/arthropods/mosquito/evg_mosquito_news1603.html

Draft chromosome assemblies add errors to genes (fragmented chromosomes, misassemblies, transposons and long introns all mangle genes), they sometimes aid in gene construction as well. Use both ways and EvidentialGene to pick out the most accurate gene set. This may seem effort-ful for you, but with experience it is less work than genome-gene modelling. A reward is in significant improvements in complete ortholog genes, as well as complete non-orthologs, often your interesting DE targets.

Don Gilbert

score 0 · Answer 2 · 2016-03-24

Yes, however those popular gene assemblers you note, Trinity, Cufflinks, are not the most capable. Velvet/Oases, idba-trans, and soap-denovo-trans do better jobs at complete gene assembly, but all together does even better. This is the McDonalds conundrum of genome informatics, the widely used, popular methods are not the healthiest and best tasting. Multi-kmer assembly (different read shredding) produces more complete genes, as each locus has different expression levels, other qualities that varying parameters need to account for. See this recent comparison http://arthropods.eugenes.org/EvidentialGene/evigene/docs/evg_geneassembly_bestmethods1603.html and earlier recommendations of How to Get Best RNA assemblies at http://arthropods.eugenes.org/EvidentialGene/about/EvidentialGene_trassembly_pipe.html -- Don Gilbert