Question

Short contigs after megahit assembly of metagenomic samples

0

Entering edit mode

5.8 years ago

Biok • 0

Hi everyone,

I am working with metagenomic samples and I co-assembled the two metagenomic samples with megahit, the problem is that I got a lot of less than 1000 bp contigs that I had to remove because my goal is to reconstruct metagenome-assembled genomes. But by doing this, I lost a lost 93% of contigs and ~80 % of nucleotides from my assembly, which means that I lost a lot of information. I give some information of the assembly after selecting contigs greater than 1000 bp:

Input: final.contigs.fa
Output : contigs.1000min.fasta
Minimum length : 1,000
Total num contigs : 7,050,789
Total num nucleotides :3,702,128,215
Contigs removed : 6595321 (93.54% of all)
Nucleotides removed: 2930511989 (79.16% of all)

I am new to metagenomics and I am not sure what can I do to improve the assembly.. After the assembly, I mapped my samples against the assembly using bowtie2 but I got a low alignment rate of the reads of my sample (~40%) which is logic given that I lost a lot of sequences. Do you have any suggestion to improve the assembly?

next-gen assembly sequence • 4.0k views

ADD COMMENT • link updated 5.8 years ago by Biostar 20 • written 5.8 years ago by Biok • 0

1

Entering edit mode

Complex metagenomes are tough to assemble, but I would not consider typical that 93% of assembled contigs are smaller than 1000 bp. I would consider simple explanations first. Are sequencing adapters removed properly? Even if you were told that they were removed, it never hurts to confirm it. I got one batch recently where the adapters were removed, but almost 1% of sequences still had them when I tested it with AdapterRemoval. Other adapter removal programs such as trimmomatic will work as well. In my case the small number of untrimmed adapters would not cause an assembly with your outcome, but it degraded it for sure. Next, make sure that you have data of good quality, which can be quickly assessed with seqtk.

If everything checks out, I suggest you try re-assembling in meta-sensitive mode with megahit, or with metaSPAdes as already suggested.

ADD REPLY • link 5.8 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

That is typically the case with metagenomic samples and it is difficult to pinpoint the reason for such short contigs. It could be anything from degraded genetic material or sequencing library prep to poor de novo assembly or anything in between. I'd suggest that if it is not too much work then try to use a different de novo assembly pipeline e.g. metaSPAdes and check if it may lead to better sequence assembly.

ADD REPLY • link 5.8 years ago by Sej Modha 5.3k