obtaining a pileup with read names from a bam alignment
1
1
Entering edit mode
8.1 years ago
bernardo1963 ▴ 30

Hi, I would like to obtain a pileup (from a bam file) that also includes the read names. samtools mpileup command only gives the bases: samtools mpileup --positions ctg2.bed -f reference_genomic2.fasta reads.bam ctg2_left_rc 177692 C 53 ,,.,.,..,....,...,,,,.....,.............AAAAAAAAAaaaa ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]O]]]]]]]]]]]]]]]]]

ctg2_left_rc 177721 T 55 ,,,.,.,.,....,...,,,,....,.........*.CCCCCCCCCCcccc ]]]]]]]]]]]K]]]]]]]]]]]R]]]L]]]]]]]]]]]]]]]]]]]]N]]]]]]

Checking in the IGV browser, I saw that samtools mpileup correctly identified the deletions in the reads (with '*')

Since I need the read names and could not find a suitable command in samtools, I wrote a script in python, using pysam (I am a newbie in both). However, it is not correctly handling indels, and instead of reporting them with '*' as samtools mpileup, it gives the previous base. My script follows; any suggestion will be greatly appreciated.

the command line: samtools_view.py --bam reads.bam --fasta reference_genomic2.fasta --chr ctg2_left_rc --start 177721 --end 177722

the script:

#! /usr/local/bin/python2.7 

import pysam

import argparse

parser = argparse.ArgumentParser(description='usage: python samtools_view.py  --bam reads.bam --fasta reference_genomic.fasta  --chr ctg2_right --start 100140 --end 100142 ')

parser.add_argument('--bam', help='Input bam file name',required=True)

parser.add_argument('--fasta', help='Input reference fasta file name',required=True)

parser.add_argument('--chr',help='target contig or chromosome', required=True)

parser.add_argument('--start',help='target region start (bp)', required=True)

parser.add_argument('--end',help='target region end (bp)', required=True)

args = parser.parse_args()

args.start = int(args.start) 

args.end = int(args.end)

samfile = pysam.AlignmentFile(args.bam, "rb")

fastafile = pysam.FastaFile( args.fasta )

for pileupcolumn in samfile.pileup(args.chr, args.start, args.end):

    for pileupread in pileupcolumn.pileups:

        if (pileupcolumn.pos >= args.start) and (pileupcolumn.pos <= args.end):

           print ('%s\t%s\t%s\t%s\t%s\t%s' %

                 (args.chr,

          pileupcolumn.pos,           

          fastafile.fetch(args.chr,pileupcolumn.reference_pos -1,pileupcolumn.reference_pos), # -1 needed because pysam is 0-based and most programs (and human heads) are 1-based

          pileupread.alignment.query_name,

          pileupread.query_position,  

                  pileupread.alignment.query_sequence[pileupread.query_position -1]))

samfile.close()

fastafile.close()
alignment next-gen pysam mpileup samtools • 3.6k views
ADD COMMENT
3
Entering edit mode
8.1 years ago

Internally, samtools mpileup is also getting the wrong base. The difference is that it also checks something of the form:

if not pileupread.is_del and not pileupread.is_refskip:

The deletion/splicing status can be checked and handled in this way. Note that you're not the first pysam user to get confused by this. This isn't really pysam's fault, it's just exposing the underlying C functions which work in this way.

ADD COMMENT
2
Entering edit mode

Thanks a lot, Devon ! I fixed the problem (and some other errors, due to 0-based vs 1-based coordinates) , and everything is fine now. I pasted below the code, for it may be useful to other people. Bernardo

#! /usr/local/bin/python2.7
#samtools_view.py     Bernardo  3mar2016  v. 5mar  5PM
#requires python2.7  
#usage: samtools_view.py   --bam reads.bam  --fasta reference_genomic2.fasta  --chr ctg2_left_rc --start 177718 --end 177722
#output:
#ctg2_left_rc    177718  A       ctg2_100x_PB_L_3933_1_0 11659   A
#ctg2_left_rc    177718  A       ctg2_100x_PB_L_5164_1_0 None    *
#code based on: http://pysam.readthedocs.org/en/latest/
#argparse info: http://www.cyberciti.biz/faq/python-command-line-arguments-argv-example/
import pysam
import argparse
parser = argparse.ArgumentParser(description='usage:  samtools_view.py  --bam reads.bam --fasta reference_genomic.fasta  --chr ctg2_right --start 100140 --end 100142 ')
parser.add_argument('--bam', help='Input bam file name',required=True)
parser.add_argument('--fasta', help='Input reference fasta file name',required=True)
parser.add_argument('--chr',help='target contig or chromosome', required=True)
parser.add_argument('--start',help='target region start (bp)', required=True)
parser.add_argument('--end',help='target region end (bp)', required=True)
args = parser.parse_args()
args.start = int(args.start) -1  #pysam is 0-based, but human minds are not
args.end   = int(args.end)   -1
samfile = pysam.AlignmentFile(args.bam, "rb")
fastafile = pysam.FastaFile( args.fasta )
for pileupcolumn in samfile.pileup(args.chr, args.start , args.end ):
    for pileupread in pileupcolumn.pileups:
        if pileupread.is_del and not pileupread.is_refskip and (pileupcolumn.pos >= args.start ) and (pileupcolumn.pos <= args.end  ):
            print ('%s\t%s\t%s\t%s\t%s\t%s' %
                 (args.chr,
                  pileupcolumn.pos +1,                                                     # +1 needed because pysam is 0-based and most programs (and human heads) are 1-based
                  fastafile.fetch(args.chr,pileupcolumn.reference_pos ,pileupcolumn.reference_pos +1), 
                  pileupread.alignment.query_name,
                  pileupread.query_position,  
                  "*"))     
        if not pileupread.is_del and not pileupread.is_refskip and (pileupcolumn.pos >= args.start ) and (pileupcolumn.pos <= args.end  ):
           print ('%s\t%s\t%s\t%s\t%s\t%s' %
                 (args.chr,
                  pileupcolumn.pos +1,
                  fastafile.fetch(args.chr,pileupcolumn.reference_pos ,pileupcolumn.reference_pos +1),  
                  pileupread.alignment.query_name,
                  pileupread.query_position,  
                  pileupread.alignment.query_sequence[pileupread.query_position]))
samfile.close()
fastafile.close()
ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6