Pysam, how to calculate how many bases on the reference are covered by reads?
0
0
Entering edit mode
3.8 years ago
Dgg32 ▴ 90

I am using Pysam to calculate how many bases of my reference contig (assembled from PacBio) are covered by the reads (Illumina sequenced from the same sample). I have tried both pileup and count_coverage. But their results are somehow different:

Preparation:

import pysam

input_file = mem_pe.bam"

samfile = pysam.AlignmentFile(input_file, "rb")

pileup:

n = 0

for pileupcolumn in samfile.pileup("tig00000003", min_base_quality=0):

    n += 1

print (n)

Output is 8817

count_coverage:

n = 0

### for 4 bases
for i in range (4):

    n += len([x  for x in samfile.count_coverage("tig00000003", quality_threshold = 0)[i] if x != 0])

print (n)

Output is 16620.

So, why the two numbers are different please?

And how can I calculate the amount of bases covered by reads by using samtools please?

BAM Pysam samtools • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 3832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6