Question

Low mapping percent (whole metagenome shotgun assembly)

0

Entering edit mode

3.1 years ago

mewgia • 0

Hello!

I have 4 samples of whole metagenome shotgun sequencing results (soil communities). Illumina, single end, read length ~76 bp. The raw reads were preprocessed using trimmomatic, adapters, short and low-quality reads were removed. The remaining reads were assembled using Spades (not --meta, because --meta doesnot work when reads are single end). Now I'm trying to map raw trimmed reads to my assemblies (bowtie2, default settings) and the mapping percent is 8-11%. Why so, any ideas? What I did wrong?

whole metagenome shotgun sequencing illumina • 869 views

ADD COMMENT • link 3.1 years ago by mewgia • 0

1

Entering edit mode

Maybe not much sequence was assembled and the rest was discarded by Spades? You should try some other assemblers like e.g. IDBA-UD or MEGAHIT (both can work with single end reads)

ADD REPLY • link 3.1 years ago by 5heikki 11k

0

Entering edit mode

the assembly graph processing in SPAdes was not designed to deal with metagenomic sequencing data. Hence, most of your reads were not assembled into contigs because there could be a very high intraspecific diversity at sequence level. I would not expect to much improvement but I would follow @5heikki suggestion. Some metagenomes could be very challenging to resolve, even using paired-end reads

ADD REPLY • link 3.1 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Thanks, I tried MEGAHIT and its results are nearly the same (mapping % (metaspades, megahit)):

sample 1 - 11.64, 6.18

sample 2 - 15.35, 23.96

sample 3 - 36.37, 51.01

sample 4 - 13.48, 6.68

Then I took unmapped reads and tried to reassemble them, but had no success.

ADD REPLY • link 3.1 years ago by mewgia • 0

0

Entering edit mode

51.01 and 23.96 for soil metagenome is not that bad. As I said, shor-reads from metagenomic samples can be very hard to assemble in longer contigs. May I ask you how many reads do you have for each sample.

ADD REPLY • link 3.1 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Yes, of course. Sample 1 - 79 329 568, 2 - 81 473 469, 3 - 78 824 132, 4 - 92 191 436.

And what about normalizing or subsampling? I run the khist script from bbnorm, and here's the result of peaks file.

k 31

unique_kmers 3079249855

error_kmers 3079247658

genomic_kmers 2197

main_peak 127

genome_size_in_peaks 6906

genome_size 9775

haploid_genome_size 9775

fold_coverage 127

haploid_fold_coverage 127

ploidy 1

percent_repeat_in_peaks 71.431

percent_repeat 75.427

start center stop max volume

25 127 140 39 416

140 144 164 25 384

164 174 188 38 319

188 200 214 19 210

214 222 236 13 109

236 240 251 5 38

251 282 309 6 174

309 341 377 9 116

377 391 402 4 36

402 413 434 3 42

434 453 464 7 37

600 615 621 4 25

621 626 636 5 20

875 882 890 7 16

14924 14934 14944 31 31

ADD REPLY • link 3.1 years ago by mewgia • 0