Question

Poor Output from Spades with High Coverage Input Data

0

Entering edit mode

6.0 years ago

callamartyn ▴ 10

Hi all,

I am testing out some different methods for assembling viral genomes from Illumina data and am having some surprising results from SPAdes. I have mapped the reads to a reference genome in Geneious for comparison and can see that my reads cover 99.1% of the 10 kb genome with an average depth of 11,258 (I sequenced very deeply and enriched the library for viral reads). So I assumed there should be more than enough data for SPAdes to output the entire genome.

However, when I run SPAdes (in paired end mode) two unusual things happen. First, it is not able to assemble the entire viral genome or even any substantial contigs of it. When I map the scaffolds back to the reference genome, I only have about 46% coverage. I can bring this up to 85% by using the "trusted contig" option but this is still well below the 99% I get from mapping all the reads directly. Does anyone have an idea why this might be the case or where I should start looking for the problem? I know SPAdes works for many people and the data I am inputting seems like it should be more than sufficient to get back a full genome.

Second, when I map the scaffolds back to the reference I can see that many of them overlap with each other substantially. Can anyone explain why they wouldn't be joined into a larger contig/scaffold? And are there any options I can add in SPAdes to join them?

Would appreciate any direction anyone can suggest to figure out what is going wrong. Thanks in advance!

assembly genome • 2.4k views

ADD COMMENT • link updated 6.0 years ago by h.mon 35k • written 6.0 years ago by callamartyn ▴ 10

0

Entering edit mode

Not answering your question but you may want to give tadpole.sh from BBMap suite a try. It works well with viral genomes. Since you have way too much coverage consider normalizing your data using bbnorm.sh (see guide above).

ADD REPLY • link 6.0 years ago by GenoMax 142k

0

Entering edit mode

Thanks so much, I will definitely try the normalization!

ADD REPLY • link 6.0 years ago by callamartyn ▴ 10

0

Entering edit mode

Is this DNAseq or RNAseq?

ADD REPLY • link 6.0 years ago by h.mon 35k

0

Entering edit mode

different methods for assembling viral genomes

Looks like DNAseq. Unless these are RNA virii.

ADD REPLY • link 6.0 years ago by GenoMax 142k

0

Entering edit mode

RNASeq; viral RNA that was reverse transcribed and prepped with a Nextera kit

ADD REPLY • link 6.0 years ago by callamartyn ▴ 10

0

Entering edit mode

Reduce your coverage. De Bruijn graph assemblers can choke on very high coverage. You can sub sample your fast as randomly with several tools.

ADD REPLY • link 6.0 years ago by Joe 21k

1

Entering edit mode

Thanks so much! I tried a few different amounts of reads and got it to work. For anyone else who encounters this problem, I tried 20,000, 500,000, and 1 million reads that had already undergone host-subtraction. For my particular data 500,000 produced the best assembly (for a 10 kb genome) and it was definitely deteriorating by 1 million.

ADD REPLY • link 6.0 years ago by callamartyn ▴ 10

0

Entering edit mode

What coverage was that equivalent to in the end?

ADD REPLY • link 6.0 years ago by Joe 21k

score 0 · Answer 1 · 2018-04-24

0

Entering edit mode

6.0 years ago

h.mon 35k

For RNAseq of RNA viruses, I had good results (meaning complete viral genomes) with Trinity + CAP3. Indeed, the initial Trinity assembly very often is fragmented, but the second CAP3 assembly step fixes this.

There are several quality control / filtering steps that may increase the quality of the assembly, have a look at the metaViC pipeline (announced here: Tools for viral metagenomics profiling and abundance estimation using BAM file ) for ideas, or use it. In particular, I would recommend very aggressive adapter and quality trimming, as there is plenty of coverage.

ADD COMMENT • link 6.0 years ago by h.mon 35k

0

Entering edit mode

Thanks so much, I finally got spades to work but am curious about metaVIC and will try it as well!

ADD REPLY • link 6.0 years ago by callamartyn ▴ 10