Hi Everyone,
I want to calculate the percentage of reads that my assembled contigs cover across the entire metagenome (paired-end). I generated the alignment file of mapped reads using BBmap as:
bbmap.sh ref=contigs.fasta \
in1=metagenome.R1.fastq.gz \
in2=metagenome.R2.fastq.gz \
outm=mapped.sam
Later, I ran
bbmap/pileup.sh in=mapped.sam \
out=contigs.coverage.txt
The std output shows as:
Note: Coverage capped at 65535; please use the flag 32bit for higher values.
Reads: 96961584
Mapped reads: 94367339
Mapped bases: 14005704234
Ref scaffolds: 5768
Ref bases: 793474333
Percent mapped: 97.324
Percent proper pairs: 92.340
**Average coverage: 17.651**
Average coverage with deletions: 18.535
Standard deviation: 118.484
Percent scaffolds with any coverage: 72.30
Percent of reference bases covered: 38.75
Is the average coverage (i.e., 17) in the above output is the in percentage?
Also, the contigs.coverage.txt shows some of the contigs coverage as 99.0460 %
head contigs.coverage.txt
#ID Avg_fold Length Ref_GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC Median_fold Std_Dev
c_4080 0.2198 3606646 0.0000 2.2786 82181 2574 2726 0.6356 0 2.94
c_3778 100.1848 3302855 0.0000 99.0460 3271345 1111925 1112085 0.4412 100 22.31
How is it possible that multiple contigs (which are not similar to each other) covers 99% percent of the metagenome?
Kindly help me to understand if my analysis or inference of the results is incorrect in any way.
If this method is wrong, do you have any suggestions for how to accomplish my goal?
Thanks a lot
Mensur Dlakic Thank you for the helpful explanation. So, This means that from these results I cannot interpret how much percent of the metagenomic reads is covered by my contig/contigs. Correct?