metagenome low alignment rate
1
0
Entering edit mode
4 months ago
TP • 0

I have 20 metagenome samples which need to be analyzed. I started with one sample for the time being with the below tools/pipelines. After assembling first sample I mapped its reads back to contigs but the alignment rate is pretty low, What could be the reason for this? I hope this workflow is correct?

reads -> Trimmomatic (cutoff 30) -> HQ reads -> megahit -> assembly -> bowtie2

After running bowtie the alignment rate is low see terminal result: 5333859 (100.00%) were paired; of these: 4888448 (91.65%) aligned concordantly 0 times 421652 (7.91%) aligned concordantly exactly 1 time

23759 (0.45%) aligned concordantly >1 times
----
4888448 pairs aligned concordantly 0 times; of these:
37619 (0.77%) aligned discordantly 1 time
----
4850829 pairs aligned 0 times concordantly or discordantly; of these:
9701658 mates make up the pairs; of these:
9400862 (96.90%) aligned 0 times
264906 (2.73%) aligned exactly 1 time
35890 (0.37%) aligned >1 times


11.88% overall alignment rate

Metagenome Megahit Bowtie • 708 views
0
Entering edit mode
4 months ago

It sounds like the assembly does not contain most of the reads.

That, in turn, may have many reasons,

I would instead investigate the assembly first, what does it actually contain? How many bases, how long, how long are the contigs, what are the contigs similar to.

0
Entering edit mode

This is soil samples, total no of assembled contigs is 198216, the length of contigs ranges from 200 to 2807 and K of 141.

0
Entering edit mode

At least for that one sample posted above the reads are not represented in the assembled metagenome. Do other samples fare any better in terms of alignments? If not as @Istvan said you will want to recheck your assemblies and re-do them.

0
Entering edit mode

Could you confirm that your largest contig is 2807 bases, not 2807 Kbases? If so, that is not a useful assembly. Most people throw away short metagenomic contigs with a cutoff at 1-3 KB. That would likely eliminate most of the contigs you have.

As already suggested, re-assembling seems to be the best course of action.

0
Entering edit mode

Yes confirm that 2807, I reassembled pasting stats directly from log/terminal:

2022-01-05 21:40:44 - 202455 contigs, total 84338781 bp, min 200 bp, max 2488 bp, avg 416 bp, N50 398 bp

Only from one sample with 1 forward and 1 reverse. length is not in Kbp.

Still alignment rate was 11.99% only.

0
Entering edit mode

An max contig of 2K and an N50 of 389 shows that the assembly was unsuccessful.

0
Entering edit mode
1. What should be ideal values to N50 and max contigs for metagenomes to decide it is good assembly?
2. Does these parameters depend on read size and read no.? May be if some one has double no of my reads he might get more longer contigs to cover genome, therefore better assembly?
3. If meta-genome assembly is not good what should be tracking steps? Shall I try different assembler?
0
Entering edit mode

This is a completely new set of questions that have little to do with the original thread. I think you should start a new thread.

Briefly:

• There is no ideal N50 value, but it should be higher for more complex genomes. For eukaryotic genomes with limited number of large chromosomes, it is reasonable to expect that N50 be in millions. For metagenomic assembly of prokaryotes, good assemblies have N50 in tens of thousands.
• Generally speaking, more reads means deeper coverage, which typically helps with the assembly. However, this relationship doesn't hold indefinitely, as there is a point when adding more reads can hurt the assembly.
• Impossible to tell with certainty. It appears that you only have 4-5 million paired reads, which likely is nowhere near the required number for proper assembly of soil samples. I suspect that complex soil samples require the number of short reads in hundreds of millions.