Entering edit mode
2.9 years ago
Dgg32 ▴ 90
I am using Pysam to calculate how many bases of my reference contig (assembled from PacBio) are covered by the reads (Illumina sequenced from the same sample). I have tried both pileup and count_coverage. But their results are somehow different:
import pysam input_file = mem_pe.bam" samfile = pysam.AlignmentFile(input_file, "rb")
n = 0 for pileupcolumn in samfile.pileup("tig00000003", min_base_quality=0): n += 1 print (n)
Output is 8817
n = 0 ### for 4 bases for i in range (4): n += len([x for x in samfile.count_coverage("tig00000003", quality_threshold = 0)[i] if x != 0]) print (n)
Output is 16620.
So, why the two numbers are different please?
And how can I calculate the amount of bases covered by reads by using samtools please?