Question: Poor Output from Spades with High Coverage Input Data
0
gravatar for callamartyn
3 months ago by
callamartyn10
callamartyn10 wrote:

Hi all,

I am testing out some different methods for assembling viral genomes from Illumina data and am having some surprising results from SPAdes. I have mapped the reads to a reference genome in Geneious for comparison and can see that my reads cover 99.1% of the 10 kb genome with an average depth of 11,258 (I sequenced very deeply and enriched the library for viral reads). So I assumed there should be more than enough data for SPAdes to output the entire genome.

However, when I run SPAdes (in paired end mode) two unusual things happen. First, it is not able to assemble the entire viral genome or even any substantial contigs of it. When I map the scaffolds back to the reference genome, I only have about 46% coverage. I can bring this up to 85% by using the "trusted contig" option but this is still well below the 99% I get from mapping all the reads directly. Does anyone have an idea why this might be the case or where I should start looking for the problem? I know SPAdes works for many people and the data I am inputting seems like it should be more than sufficient to get back a full genome.

Second, when I map the scaffolds back to the reference I can see that many of them overlap with each other substantially. Can anyone explain why they wouldn't be joined into a larger contig/scaffold? And are there any options I can add in SPAdes to join them?

Would appreciate any direction anyone can suggest to figure out what is going wrong. Thanks in advance!

assembly genome • 194 views
ADD COMMENTlink modified 3 months ago by h.mon18k • written 3 months ago by callamartyn10

Not answering your question but you may want to give tadpole.sh from BBMap suite a try. It works well with viral genomes. Since you have way too much coverage consider normalizing your data using bbnorm.sh (see guide above).

ADD REPLYlink written 3 months ago by genomax54k

Thanks so much, I will definitely try the normalization!

ADD REPLYlink written 3 months ago by callamartyn10

Is this DNAseq or RNAseq?

ADD REPLYlink written 3 months ago by h.mon18k

different methods for assembling viral genomes

Looks like DNAseq. Unless these are RNA virii.

ADD REPLYlink written 3 months ago by genomax54k

RNASeq; viral RNA that was reverse transcribed and prepped with a Nextera kit

ADD REPLYlink written 3 months ago by callamartyn10

Reduce your coverage. De Bruijn graph assemblers can choke on very high coverage. You can sub sample your fast as randomly with several tools.

ADD REPLYlink written 3 months ago by jrj.healey6.1k
1

Thanks so much! I tried a few different amounts of reads and got it to work. For anyone else who encounters this problem, I tried 20,000, 500,000, and 1 million reads that had already undergone host-subtraction. For my particular data 500,000 produced the best assembly (for a 10 kb genome) and it was definitely deteriorating by 1 million.

ADD REPLYlink written 3 months ago by callamartyn10

What coverage was that equivalent to in the end?

ADD REPLYlink written 3 months ago by jrj.healey6.1k
0
gravatar for h.mon
3 months ago by
h.mon18k
Brazil
h.mon18k wrote:

For RNAseq of RNA viruses, I had good results (meaning complete viral genomes) with Trinity + CAP3. Indeed, the initial Trinity assembly very often is fragmented, but the second CAP3 assembly step fixes this.

There are several quality control / filtering steps that may increase the quality of the assembly, have a look at the metaViC pipeline (announced here: Tools for viral metagenomics profiling and abundance estimation using BAM file ) for ideas, or use it. In particular, I would recommend very aggressive adapter and quality trimming, as there is plenty of coverage.

ADD COMMENTlink modified 3 months ago • written 3 months ago by h.mon18k

Thanks so much, I finally got spades to work but am curious about metaVIC and will try it as well!

ADD REPLYlink written 3 months ago by callamartyn10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1498 users visited in the last hour