Discrepancy between samtools bedcov and pysam.count
0
0
Entering edit mode
2.5 years ago
Josh • 0

I'm writing a new version of a tool and I'm trying to implement some functionality using pysam that was previously implemented using samtools bedcov.

For example, here is a sample output of samtools bedcov test.bed tcga_test.bam -Q 50:

chr19   50858094    50858095    64004

chr19   50858128    50858129    63170

chr19   50858162    50858163    51889

To calculate the coverage I've been using the pysam function AlignmentFile.count in the following way:

import pysam

def read_check(read):
    if read.flag & (0x4 | 0x100 | 0x200 | 0x400): # FUNMAP,FSECONDARY,FQCFAIL,FDUP
        return False
    elif read.mapping_quality <= 50:
        return False
    else:
        return True

bamfile = pysam.AlignmentFile('tcga_test.bam')
bamfile.count('chr19', start = 50858094, stop = 50858095, read_callback = read_check)
bamfile.count('chr19', start = 50858128, stop = 50858129, read_callback = read_check)
bamfile.count('chr19', start = 50858162, stop = 50858163, read_callback = read_check)

From this I get

68361
63170
51889

samtools bedcov is filtering the first region in some way I don't see. The difference here seems to be an edge case--I only run into it one or two times processing large BAM files, approx 1% of cases I test. I originally thought the difference may be due to filtering of overlapping read pairs. The code I used to test this was:

read_set = set()
for read in bamfile.fetch('chr19', start, start + 1):
    read_set.add(read.query_name) if read_check(read) else None
len(read_set)

But I get:

67676
62374
50940

So clearly samtools bedcov is not doing this type of filtering, though maybe my code doesn't work. From what I understand, samtools bedcov uses samtools mpileup to count the reads. I tried to test all the ways mpileup could filter reads but I haven't come up with anything that explains the difference in the small number of cases I come across.

BAM example

pysam coverage samtools • 920 views
ADD COMMENT

Login before adding your answer.

Traffic: 1425 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6