Question

de novo assembly v.s. genome guided assembly (Trinity) -- better N50 from de novo than genome guided

2

Entering edit mode

9.2 years ago

Yang Li ▴ 70

Hi, there

I made a test for virus assembly using Trinity (2.02). The results as followings:

1. normalize before assembly

Trinity --normalize_reads --seqType fa --max_memory 5G --single temp.Flavivirus.fa --CPU 6
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity.fasta
stats_assembly    N50    1190
stats_assembly    N75    356
stats_assembly    N90    249
stats_assembly    N95    231

2. de novo

Trinity --seqType fa --max_memory 5G --single temp.Flavivirus.fa --CPU 6
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity.fasta
stats_assembly    N50    3800
stats_assembly    N75    444
stats_assembly    N90    248
stats_assembly    N95    232

3. ref_guided and normalize

prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity-GG.fasta
stats_assembly    N50    835
stats_assembly    N75    282
stats_assembly    N90    230
stats_assembly    N95    217

4. ref_guided assembly

Trinity --genome_guided_bam refguided.sam.sort.bam -max_memory 5G  --CPU 6 --genome_guided_max_intron 10000
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity-GG.fasta
stats_assembly    N50    1855
stats_assembly    N75    316
stats_assembly    N90    242
stats_assembly    N95    223

Materials:

A clinical sample were subject to PGM. An in-house pipeline showed the reads file covered 96.41% of the reference genome (gi|428621807|gb|JQ917404.1| Dengue virus 1 isolate RR57).

Discussion:

In my opinion, the best N50 might be from the reference guided assembly with normalization. In fact, it didnot work as well as I wished. Could you help me figure out why the N50 from denovo were better than reference guided assembly?

Thank you

Trinity Virus Assembly RNA-Seq • 5.2k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Yang Li ▴ 70

0

Entering edit mode

N50 is a good statistic to measure the quantity of the assembly, but not the quality. Your de novo assembly, with normalization, is probably the most accurate assembly next to your reference-guided one.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Thank you. How to value the quality of assembly? I tried to extract the longest contig from above four strategy and blast them. The results showed no significantly difference from blast results. (Coverage: 100%, Identity: 99%, E-value: down to 0).

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Yang Li ▴ 70

Ram · Answer 1 · 2015-02-06

0

Entering edit mode

9.2 years ago

5heikki 11k

Maybe the genomes have similar gene content but little synteny? However, considering how small Flavivirus genomes are, I'm surprised you didn't get a complete genome from de novo assembly.

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by 5heikki 11k

0

Entering edit mode

Thank you. Could you give me some advice to generate a longer contig? Because sequencing strategy used in PGM is single end, tools such as SSPACE were developed for paired-end reads.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Yang Li ▴ 70