Question: Extracted mapped shotgun metagenomic reads to reference genome. SPAdes or metaSPAdes for de-novo assembly?
gravatar for O.rka
2.0 years ago by
O.rka210 wrote:

I have a reference strain and mapped all of my shotgun metagenomic reads to the reference strain using BBMap.

I extracted the mapped reads and want to create an assembly from this.

I usually use metaSPAdes for this but would SPAdes be better suited for this task? My PI prefers using SPAdes and metaSPAdes but I'm wondering which one would be better for this task in particular since it's only a (semi-)supervised subset of a metagenome.

[Bonus] if there is another assembler that is better suited for this exact task please let me know.

metagenomics assembly de-novo • 891 views
ADD COMMENTlink modified 2.0 years ago by h.mon31k • written 2.0 years ago by O.rka210
gravatar for h.mon
2.0 years ago by
h.mon31k wrote:

Answering directly your question, I think you should use SPAdes, then also probably filter contigs diverging too much from the coverage of the longest contigs (this assumes the longest contigs belong to the strain of interest, which they should, as the reads used for assembly have been enriched for this strain, and remember contigs with rRNA reads in general have abnormally high coverage). But if the strain of interest is rare on your metagenomic sample, your resulting assembly will be very fragmented due to low coverage.

More philosophically:

Looking at some of your recent threads, it seems you have been struggling with the same issue for some days now. From older to more recent:

How to extract reads that match k-mer profiles from a collection of sequences?

How to interpret sam file generated from BBMap?

This thread

Can you assemble with merged paired end reads and unmatched reads as "single ended" reads?

So it seems you have shotgun metagenomics sequencing, but are interested in only one particular strain. It would be helpful you describe the problem in more detail, and you motivation to take this approach. This would help us evaluate if your approach is sound, or if a completely different approach is better.

The approach you have chosen seems to be mapping to a reference strain (a published genome?), and then assembling the genome using just the mapped reads. I wonder if just mapping to the reference strain and examining differences (calling SNPs / indels and structural variants) would be god enough for your purposes? Or assembling the whole metagenome, and then recovering the contigs belonging to the strain of interest?

ADD COMMENTlink written 2.0 years ago by h.mon31k

Yes, that's exactly what I'm doing: I have downloaded all of the reference genomes for a species, I'm mapping my reads to it with a wide net (BBMap default = 76% identity), getting the mapped reads and assembling these. I'm not necessarily looking for closed genomes but mostly de-novo assemblies of the organism in the samples I have. Is this method appropriate? Would it be better to use k-mer profiles instead of mapping? I planned on manually binning the contigs after to exclude any false positives.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by O.rka210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1390 users visited in the last hour