In pysam pileup, discrepancy between stepper all and stepper samtools reads
0
0
Entering edit mode
10 months ago
Michael • 0

Dear all,

I've not been able to understand from the documentation the precise differences in behavior between the steppers 'all' and 'samtools' in their default settings, which by the docs seem to indicate that they have the same filters engaged (BAM_FUNMAP | BAM_FSECONDARY | BAM_FQCFAIL | BAM_FDUP). Printed also base quality which is above samtools stepper default minimum of 13

Here's a sample code in which different results are obtained from both: (pysam version 0.21.0, Python 3.6.9, Ubuntu 18.04)

import pysam

# from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00120/alignment/HG00120.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
# in ubuntu executed:
# wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00120/alignment/HG00120.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
# wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00120/alignment/HG00120.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai

BAM_FILENAME = 'HG00120.chrom11.ILLUMINA.bwa.GBR.exome.20120522.bam'

chr = 11
coord = 76999

def print_pileup(pileupcolumn):
    for pileupread in pileupcolumn.pileups:
        query_name = pileupread.alignment.query_name
        base_now = pileupread.alignment.query_sequence[ pileupread.query_position]
        mq_now = pileupread.alignment.mapping_quality
        bq_now = pileupread.alignment.query_qualities[ pileupread.query_position]

        print('query_name: ' + query_name +' base_now: '+ base_now + ' mq_now: ' + str(mq_now) + ' bq_now: ' + str(bq_now))

print('stepper all')
samfile = pysam.AlignmentFile(BAM_FILENAME, "rb" )
for pileupcolumn in samfile.pileup(contig=str(chr),start=coord,stop=coord+1,truncate=True, stepper = 'all'):
    print_pileup(pileupcolumn)

print('stepper samtools')
samfile = pysam.AlignmentFile(BAM_FILENAME, "rb" )
for pileupcolumn in samfile.pileup(contig=str(chr),start=coord,stop=coord+1,truncate=True, stepper = 'samtools'):
    print_pileup(pileupcolumn)

output:

ubuntu@ip-XXX~$ python3 stepper.py
stepper all
query_name: SRR099961.15186690 base_now: A mq_now: 18 bq_now: 21
query_name: SRR099961.45087351 base_now: A mq_now: 36 bq_now: 67
query_name: SRR099961.113296055 base_now: A mq_now: 14 bq_now: 31
stepper samtools
query_name: SRR099961.15186690 base_now: A mq_now: 18 bq_now: 21
query_name: SRR099961.45087351 base_now: A mq_now: 36 bq_now: 67
ubuntu@ip-XXX~$

PS another question- the wording of the documentation is that for stepper samtools, min_base_quality is defined as all greater or equal and min_mapping_quality is defined as only greater- is this the correct behavior? from the docs:

min_base_quality (int) – Minimum base quality. Bases below the minimum quality will not be output. The default is 13. adjust_capq_threshold (int) – adjust mapping quality. The default is 0 for no adjustment. The recommended value for adjustment is 50. min_mapping_quality (int) – only use reads above a minimum mapping quality. The default is 0.

samtools pileup stepper pysam • 591 views
ADD COMMENT

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6