Question: de novo assembly v.s. genome guided assembly (Trinity) -- better N50 from de novo than genome guided
2
gravatar for Yang Li
4.7 years ago by
Yang Li70
China
Yang Li70 wrote:

Hi, there

      I made a test for virus assembly using Trinity (2.02). The results as followings:


1, normalize before assembly
Trinity --normalize_reads --seqType fa --max_memory 5G --single temp.Flavivirus.fa --CPU 6
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity.fasta
stats_assembly    N50    1190
stats_assembly    N75    356
stats_assembly    N90    249
stats_assembly    N95    231
2, de novo
Trinity --seqType fa --max_memory 5G --single temp.Flavivirus.fa --CPU 6
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity.fasta
stats_assembly    N50    3800
stats_assembly    N75    444
stats_assembly    N90    248
stats_assembly    N95    232
3, ref_guided and normalize
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity-GG.fasta
stats_assembly    N50    835
stats_assembly    N75    282
stats_assembly    N90    230
stats_assembly    N95    217
4, ref_guided assembly
Trinity --genome_guided_bam refguided.sam.sort.bam -max_memory 5G  --CPU 6 --genome_guided_max_intron 10000
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity-GG.fasta
stats_assembly    N50    1855
stats_assembly    N75    316
stats_assembly    N90    242
stats_assembly    N95    223


Materials:

A clinical sample were subject to PGM. An in-house pipeline showed the reads file covered 96.41% of the reference genome (gi|428621807|gb|JQ917404.1| Dengue virus 1 isolate RR57). 


Discussion:

In my opinion, the best N50 might be from the reference guided assembly with normalization. In fact, it didnot work as well as I wished. Could you help me figure out why the N50 from denovo were better than reference guided assembly?

 

Thank you

ADD COMMENTlink modified 4.7 years ago by 5heikki8.5k • written 4.7 years ago by Yang Li70

N50 is a good statistic to measure the quantity of the assembly, but not the quality. Your de novo assembly, with normalization, is probably the most accurate assembly next to your reference-guided one.

ADD REPLYlink written 4.7 years ago by st.ph.n2.5k

Thank you. How to value the quality of assembly? I tried to extract the longest contig from above four strategy and blast them. The results showed no significantly difference from blast results. (Coverage: 100%, Identity: 99%, E-value: down to 0).

ADD REPLYlink written 4.7 years ago by Yang Li70
0
gravatar for 5heikki
4.7 years ago by
5heikki8.5k
Finland
5heikki8.5k wrote:

Maybe the genomes have similar gene content but little synteny? However, considering how small Flavivirus genomes are, I'm surprised you didn't get a complete genome from de novo assembly.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by 5heikki8.5k

Thank you. Could you give me some advice to generate a longer contig? Because sequencing strategy used in PGM is single end, tools such as SSPACE were developed for paired-end reads. 

ADD REPLYlink written 4.7 years ago by Yang Li70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2153 users visited in the last hour