Question: Combining de novo and genome guided transcriptome assembly for expression analysis?
2
gravatar for standonn
3.5 years ago by
standonn20
standonn20 wrote:

Dear all,

I am about to do some RNA-seq analysis to find differently expressed (DE) genes between different conditions. I am studying a nematode which is relatively unknown but for which I have a genome assembled (not published yet).

Now because I have a genome I could use the tuxedo pipeline (tophat / Cufflinks / Cuffdiff) to find the DE genes. But my concern is that the genome is newly assembled and therefore not very polished. I also read this post (Is There Any Reason To Do De Novo Transcript Assembly If A Reference Is Available?) which made me think that perhaps using both the de novo and the genome-guided strategies was better.

I then found that is it possible to combine de novo transcriptome assemblies with genome-guided assemblies (http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-2277-7 , http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0091776) by using the tr2aacds pipeline of EvidentialGene (http://arthropods.eugenes.org/genes2/about/EvidentialGene_trassembly_pipe.html).

If I'm understanding correctly, I would then have (I hope) a very good transcriptome to which I could align my reads (both conditions separately) and perform the differential expression analysis.

My question is: is it a good ideia / justified to do the RNA-seq analysis using a transcriptome assembled de novo as well as genome-guided? Or is the tuxedo pipeline better?

Advices very much appreciated!

Best Wishes, Sophie

ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 3.5 years ago by standonn20
0
gravatar for gilbert.bionet
3.5 years ago by
gilbert.bionet130 wrote:

You will reconstruct genes more accurately and completely from several multi-kmer de-novo assemblies of RNA, with different assemblers, including cufflinks if you want, than from genome-based gene construction or modelling. This is according to my now fairly extensive results with various animals and plants, and is a major reason I want folks to try the http://arthropods.eugenes.org/EvidentialGene/ methods (ie improve our published gene sets). This is presuming you have enough high quality paired-end Illumina RNA-seq.

See for recent mosquito gene sets, EvidentialGene versus MAKER+Trinity comparisons, http://arthropods.eugenes.org/EvidentialGene/arthropods/mosquito/evg_mosquito_news1603.html

Draft chromosome assemblies add errors to genes (fragmented chromosomes, misassemblies, transposons and long introns all mangle genes), they sometimes aid in gene construction as well. Use both ways and EvidentialGene to pick out the most accurate gene set. This may seem effort-ful for you, but with experience it is less work than genome-gene modelling. A reward is in significant improvements in complete ortholog genes, as well as complete non-orthologs, often your interesting DE targets.

  • Don Gilbert
ADD COMMENTlink written 3.5 years ago by gilbert.bionet130

Thanks for your answer! It does help. Yes I do have good quality PE reads. So if I understand correctly I would build several transcriptome assemblies, using different programs (Trinity, Cufflinks) and different strategies (de novo and genome-guided), merge them and use EvidentialGene/tr2aacds.pl to get the best set of transcripts. I would then map my reads to that "optimized transcriptome" and get the transcript abundances for each of my conditions (using Bowtie2 + Corset for example). Then I could conduct the statistical analysis (DEseq, edgeR, ...). Does this plan sound sturdy? (Apologies if this is a very basic question, I just want to make sure I'm doing things right)

ADD REPLYlink written 3.5 years ago by standonn20
0
gravatar for gilbert.bionet
3.5 years ago by
gilbert.bionet130 wrote:

Yes, however those popular gene assemblers you note, Trinity, Cufflinks, are not the most capable. Velvet/Oases, idba-trans, and soap-denovo-trans do better jobs at complete gene assembly, but all together does even better. This is the McDonalds conundrum of genome informatics, the widely used, popular methods are not the healthiest and best tasting. Multi-kmer assembly (different read shredding) produces more complete genes, as each locus has different expression levels, other qualities that varying parameters need to account for. See this recent comparison http://arthropods.eugenes.org/EvidentialGene/evigene/docs/evg_geneassembly_bestmethods1603.html and earlier recommendations of How to Get Best RNA assemblies at http://arthropods.eugenes.org/EvidentialGene/about/EvidentialGene_trassembly_pipe.html -- Don Gilbert

ADD COMMENTlink written 3.5 years ago by gilbert.bionet130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1357 users visited in the last hour