de novo assembly v.s. genome guided assembly (Trinity) -- better N50 from de novo than genome guided
1
2
Entering edit mode
9.2 years ago
Yang Li ▴ 70

Hi, there

I made a test for virus assembly using Trinity (2.02). The results as followings:

1. normalize before assembly

Trinity --normalize_reads --seqType fa --max_memory 5G --single temp.Flavivirus.fa --CPU 6
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity.fasta
stats_assembly    N50    1190
stats_assembly    N75    356
stats_assembly    N90    249
stats_assembly    N95    231

2. de novo

Trinity --seqType fa --max_memory 5G --single temp.Flavivirus.fa --CPU 6
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity.fasta
stats_assembly    N50    3800
stats_assembly    N75    444
stats_assembly    N90    248
stats_assembly    N95    232

3. ref_guided and normalize

prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity-GG.fasta
stats_assembly    N50    835
stats_assembly    N75    282
stats_assembly    N90    230
stats_assembly    N95    217

4. ref_guided assembly

Trinity --genome_guided_bam refguided.sam.sort.bam -max_memory 5G  --CPU 6 --genome_guided_max_intron 10000
prinseq-lite.pl -stats_assembly -fasta ./trinity_out_dir/Trinity-GG.fasta
stats_assembly    N50    1855
stats_assembly    N75    316
stats_assembly    N90    242
stats_assembly    N95    223

Materials:

A clinical sample were subject to PGM. An in-house pipeline showed the reads file covered 96.41% of the reference genome (gi|428621807|gb|JQ917404.1| Dengue virus 1 isolate RR57).

Discussion:

In my opinion, the best N50 might be from the reference guided assembly with normalization. In fact, it didnot work as well as I wished. Could you help me figure out why the N50 from denovo were better than reference guided assembly?

Thank you

Trinity Virus Assembly RNA-Seq • 5.2k views
ADD COMMENT
0
Entering edit mode

N50 is a good statistic to measure the quantity of the assembly, but not the quality. Your de novo assembly, with normalization, is probably the most accurate assembly next to your reference-guided one.

ADD REPLY
0
Entering edit mode

Thank you. How to value the quality of assembly? I tried to extract the longest contig from above four strategy and blast them. The results showed no significantly difference from blast results. (Coverage: 100%, Identity: 99%, E-value: down to 0).

ADD REPLY
0
Entering edit mode
9.2 years ago
5heikki 11k

Maybe the genomes have similar gene content but little synteny? However, considering how small Flavivirus genomes are, I'm surprised you didn't get a complete genome from de novo assembly.

ADD COMMENT
0
Entering edit mode

Thank you. Could you give me some advice to generate a longer contig? Because sequencing strategy used in PGM is single end, tools such as SSPACE were developed for paired-end reads.

ADD REPLY

Login before adding your answer.

Traffic: 2179 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6