Question: Short contigs after megahit assembly of metagenomic samples
gravatar for Biok
17 months ago by
Biok0 wrote:

Hi everyone,

I am working with metagenomic samples and I co-assembled the two metagenomic samples with megahit, the problem is that I got a lot of less than 1000 bp contigs that I had to remove because my goal is to reconstruct metagenome-assembled genomes. But by doing this, I lost a lost 93% of contigs and ~80 % of nucleotides from my assembly, which means that I lost a lot of information. I give some information of the assembly after selecting contigs greater than 1000 bp:

  1. Input: final.contigs.fa
  2. Output : contigs.1000min.fasta
  3. Minimum length : 1,000
  4. Total num contigs : 7,050,789
  5. Total num nucleotides :3,702,128,215
  6. Contigs removed : 6595321 (93.54% of all)
  7. Nucleotides removed: 2930511989 (79.16% of all)

I am new to metagenomics and I am not sure what can I do to improve the assembly.. After the assembly, I mapped my samples against the assembly using bowtie2 but I got a low alignment rate of the reads of my sample (~40%) which is logic given that I lost a lot of sequences. Do you have any suggestion to improve the assembly?

sequence next-gen assembly • 987 views
ADD COMMENTlink modified 17 months ago by Biostar ♦♦ 20 • written 17 months ago by Biok0

Complex metagenomes are tough to assemble, but I would not consider typical that 93% of assembled contigs are smaller than 1000 bp. I would consider simple explanations first. Are sequencing adapters removed properly? Even if you were told that they were removed, it never hurts to confirm it. I got one batch recently where the adapters were removed, but almost 1% of sequences still had them when I tested it with AdapterRemoval. Other adapter removal programs such as trimmomatic will work as well. In my case the small number of untrimmed adapters would not cause an assembly with your outcome, but it degraded it for sure. Next, make sure that you have data of good quality, which can be quickly assessed with seqtk.

If everything checks out, I suggest you try re-assembling in meta-sensitive mode with megahit, or with metaSPAdes as already suggested.

ADD REPLYlink modified 17 months ago • written 17 months ago by Mensur Dlakic9.0k

That is typically the case with metagenomic samples and it is difficult to pinpoint the reason for such short contigs. It could be anything from degraded genetic material or sequencing library prep to poor de novo assembly or anything in between. I'd suggest that if it is not too much work then try to use a different de novo assembly pipeline e.g. metaSPAdes and check if it may lead to better sequence assembly.

ADD REPLYlink written 17 months ago by Sej Modha4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1435 users visited in the last hour