metagenome low alignment rate
1
0
Entering edit mode
4 months ago
TP • 0

I have 20 metagenome samples which need to be analyzed. I started with one sample for the time being with the below tools/pipelines. After assembling first sample I mapped its reads back to contigs but the alignment rate is pretty low, What could be the reason for this? I hope this workflow is correct?

reads -> Trimmomatic (cutoff 30) -> HQ reads -> megahit -> assembly -> bowtie2

After running bowtie the alignment rate is low see terminal result: 5333859 (100.00%) were paired; of these: 4888448 (91.65%) aligned concordantly 0 times 421652 (7.91%) aligned concordantly exactly 1 time

23759 (0.45%) aligned concordantly >1 times
----
4888448 pairs aligned concordantly 0 times; of these:
  37619 (0.77%) aligned discordantly 1 time
----
4850829 pairs aligned 0 times concordantly or discordantly; of these:
  9701658 mates make up the pairs; of these:
    9400862 (96.90%) aligned 0 times
    264906 (2.73%) aligned exactly 1 time
    35890 (0.37%) aligned >1 times

11.88% overall alignment rate

Metagenome Megahit Bowtie • 708 views
ADD COMMENT
0
Entering edit mode
4 months ago

It sounds like the assembly does not contain most of the reads.

That, in turn, may have many reasons,

I would instead investigate the assembly first, what does it actually contain? How many bases, how long, how long are the contigs, what are the contigs similar to.

ADD COMMENT
0
Entering edit mode

This is soil samples, total no of assembled contigs is 198216, the length of contigs ranges from 200 to 2807 and K of 141.

ADD REPLY
0
Entering edit mode

At least for that one sample posted above the reads are not represented in the assembled metagenome. Do other samples fare any better in terms of alignments? If not as @Istvan said you will want to recheck your assemblies and re-do them.

ADD REPLY
0
Entering edit mode

Could you confirm that your largest contig is 2807 bases, not 2807 Kbases? If so, that is not a useful assembly. Most people throw away short metagenomic contigs with a cutoff at 1-3 KB. That would likely eliminate most of the contigs you have.

As already suggested, re-assembling seems to be the best course of action.

ADD REPLY
0
Entering edit mode

Yes confirm that 2807, I reassembled pasting stats directly from log/terminal:

2022-01-05 21:40:44 - 202455 contigs, total 84338781 bp, min 200 bp, max 2488 bp, avg 416 bp, N50 398 bp

Only from one sample with 1 forward and 1 reverse. length is not in Kbp.

Still alignment rate was 11.99% only.

ADD REPLY
0
Entering edit mode

An max contig of 2K and an N50 of 389 shows that the assembly was unsuccessful.

ADD REPLY
0
Entering edit mode
  1. What should be ideal values to N50 and max contigs for metagenomes to decide it is good assembly?
  2. Does these parameters depend on read size and read no.? May be if some one has double no of my reads he might get more longer contigs to cover genome, therefore better assembly?
  3. If meta-genome assembly is not good what should be tracking steps? Shall I try different assembler?
ADD REPLY
0
Entering edit mode

This is a completely new set of questions that have little to do with the original thread. I think you should start a new thread.

Briefly:

  • There is no ideal N50 value, but it should be higher for more complex genomes. For eukaryotic genomes with limited number of large chromosomes, it is reasonable to expect that N50 be in millions. For metagenomic assembly of prokaryotes, good assemblies have N50 in tens of thousands.
  • Generally speaking, more reads means deeper coverage, which typically helps with the assembly. However, this relationship doesn't hold indefinitely, as there is a point when adding more reads can hurt the assembly.
  • Impossible to tell with certainty. It appears that you only have 4-5 million paired reads, which likely is nowhere near the required number for proper assembly of soil samples. I suspect that complex soil samples require the number of short reads in hundreds of millions.
ADD REPLY

Login before adding your answer.

Traffic: 2790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6