Find coverage of assemblled contigs in metagenomic reads
1
0
Entering edit mode
16 months ago
shail.nair05 ▴ 20

Hi Everyone,

I want to calculate the percentage of reads that my assembled contigs cover across the entire metagenome (paired-end). I generated the alignment file of mapped reads using BBmap as:

bbmap.sh ref=contigs.fasta \
in1=metagenome.R1.fastq.gz \
in2=metagenome.R2.fastq.gz \
outm=mapped.sam

Later, I ran

 bbmap/pileup.sh in=mapped.sam \
out=contigs.coverage.txt

The std output shows as:

Note: Coverage capped at 65535; please use the flag 32bit for higher values.
Reads:                                  96961584
Mapped reads:                           94367339
Mapped bases:                           14005704234
Ref scaffolds:                          5768
Ref bases:                              793474333

Percent mapped:                         97.324
Percent proper pairs:                   92.340
**Average coverage:                       17.651**
Average coverage with deletions:        18.535
Standard deviation:                     118.484
Percent scaffolds with any coverage:    72.30
Percent of reference bases covered:     38.75

Is the average coverage (i.e., 17) in the above output is the in percentage?

Also, the contigs.coverage.txt shows some of the contigs coverage as 99.0460 %

head contigs.coverage.txt

#ID Avg_fold    Length  Ref_GC  Covered_percent Covered_bases   Plus_reads  Minus_reads Read_GC Median_fold Std_Dev
c_4080  0.2198  3606646 0.0000  2.2786  82181   2574    2726    0.6356  0   2.94
c_3778  100.1848    3302855 0.0000  99.0460 3271345 1111925 1112085 0.4412  100 22.31

How is it possible that multiple contigs (which are not similar to each other) covers 99% percent of the metagenome?

Kindly help me to understand if my analysis or inference of the results is incorrect in any way.

If this method is wrong, do you have any suggestions for how to accomplish my goal?

Thanks a lot

coverage metagenomic fasta bbmap alignment • 958 views
ADD COMMENT
0
Entering edit mode
16 months ago
Mensur Dlakic ★ 27k

Average coverage of 17.651x means that all bases in the contig have on average 17.651 different reads mapping to them. Some regions of the contig will have greater coverage, and others will have smaller.

Covered_percent of of 99.0460% is 3271345 / 3302855 (Covered_bases / Length). It means that 99+% of bases have at least one read mapping to them,

ADD COMMENT
0
Entering edit mode

Mensur Dlakic Thank you for the helpful explanation. So, This means that from these results I cannot interpret how much percent of the metagenomic reads is covered by my contig/contigs. Correct?

ADD REPLY

Login before adding your answer.

Traffic: 2836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6