Question: obtaining a pileup with read names from a bam alignment
1
gravatar for bernardo1963
4.9 years ago by
bernardo196330
bernardo196330 wrote:

Hi, I would like to obtain a pileup (from a bam file) that also includes the read names. samtools mpileup command only gives the bases: samtools mpileup --positions ctg2.bed -f reference_genomic2.fasta reads.bam ctg2_left_rc 177692 C 53 ,,.,.,..,....,...,,,,.....,.............AAAAAAAAAaaaa ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]O]]]]]]]]]]]]]]]]]

ctg2_left_rc 177721 T 55 ,,,.,.,.,....,...,,,,....,.........*.CCCCCCCCCCcccc ]]]]]]]]]]]K]]]]]]]]]]]R]]]L]]]]]]]]]]]]]]]]]]]]N]]]]]]

Checking in the IGV browser, I saw that samtools mpileup correctly identified the deletions in the reads (with '*')

Since I need the read names and could not find a suitable command in samtools, I wrote a script in python, using pysam (I am a newbie in both). However, it is not correctly handling indels, and instead of reporting them with '*' as samtools mpileup, it gives the previous base. My script follows; any suggestion will be greatly appreciated.

the command line: samtools_view.py --bam reads.bam --fasta reference_genomic2.fasta --chr ctg2_left_rc --start 177721 --end 177722

the script:

#! /usr/local/bin/python2.7 

import pysam

import argparse

parser = argparse.ArgumentParser(description='usage: python samtools_view.py  --bam reads.bam --fasta reference_genomic.fasta  --chr ctg2_right --start 100140 --end 100142 ')

parser.add_argument('--bam', help='Input bam file name',required=True)

parser.add_argument('--fasta', help='Input reference fasta file name',required=True)

parser.add_argument('--chr',help='target contig or chromosome', required=True)

parser.add_argument('--start',help='target region start (bp)', required=True)

parser.add_argument('--end',help='target region end (bp)', required=True)

args = parser.parse_args()

args.start = int(args.start) 

args.end = int(args.end)

samfile = pysam.AlignmentFile(args.bam, "rb")

fastafile = pysam.FastaFile( args.fasta )

for pileupcolumn in samfile.pileup(args.chr, args.start, args.end):

    for pileupread in pileupcolumn.pileups:

        if (pileupcolumn.pos >= args.start) and (pileupcolumn.pos <= args.end):

           print ('%s\t%s\t%s\t%s\t%s\t%s' %

                 (args.chr,

          pileupcolumn.pos,           

          fastafile.fetch(args.chr,pileupcolumn.reference_pos -1,pileupcolumn.reference_pos), # -1 needed because pysam is 0-based and most programs (and human heads) are 1-based

          pileupread.alignment.query_name,

          pileupread.query_position,  

                  pileupread.alignment.query_sequence[pileupread.query_position -1]))

samfile.close()

fastafile.close()
ADD COMMENTlink modified 4.9 years ago by Devon Ryan98k • written 4.9 years ago by bernardo196330
3
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

Internally, samtools mpileup is also getting the wrong base. The difference is that it also checks something of the form:

if not pileupread.is_del and not pileupread.is_refskip:

The deletion/splicing status can be checked and handled in this way. Note that you're not the first pysam user to get confused by this. This isn't really pysam's fault, it's just exposing the underlying C functions which work in this way.

ADD COMMENTlink written 4.9 years ago by Devon Ryan98k
2

Thanks a lot, Devon ! I fixed the problem (and some other errors, due to 0-based vs 1-based coordinates) , and everything is fine now. I pasted below the code, for it may be useful to other people. Bernardo

#! /usr/local/bin/python2.7
#samtools_view.py     Bernardo  3mar2016  v. 5mar  5PM
#requires python2.7  
#usage: samtools_view.py   --bam reads.bam  --fasta reference_genomic2.fasta  --chr ctg2_left_rc --start 177718 --end 177722
#output:
#ctg2_left_rc    177718  A       ctg2_100x_PB_L_3933_1_0 11659   A
#ctg2_left_rc    177718  A       ctg2_100x_PB_L_5164_1_0 None    *
#code based on: http://pysam.readthedocs.org/en/latest/
#argparse info: http://www.cyberciti.biz/faq/python-command-line-arguments-argv-example/
import pysam
import argparse
parser = argparse.ArgumentParser(description='usage:  samtools_view.py  --bam reads.bam --fasta reference_genomic.fasta  --chr ctg2_right --start 100140 --end 100142 ')
parser.add_argument('--bam', help='Input bam file name',required=True)
parser.add_argument('--fasta', help='Input reference fasta file name',required=True)
parser.add_argument('--chr',help='target contig or chromosome', required=True)
parser.add_argument('--start',help='target region start (bp)', required=True)
parser.add_argument('--end',help='target region end (bp)', required=True)
args = parser.parse_args()
args.start = int(args.start) -1  #pysam is 0-based, but human minds are not
args.end   = int(args.end)   -1
samfile = pysam.AlignmentFile(args.bam, "rb")
fastafile = pysam.FastaFile( args.fasta )
for pileupcolumn in samfile.pileup(args.chr, args.start , args.end ):
    for pileupread in pileupcolumn.pileups:
        if pileupread.is_del and not pileupread.is_refskip and (pileupcolumn.pos >= args.start ) and (pileupcolumn.pos <= args.end  ):
            print ('%s\t%s\t%s\t%s\t%s\t%s' %
                 (args.chr,
                  pileupcolumn.pos +1,                                                     # +1 needed because pysam is 0-based and most programs (and human heads) are 1-based
                  fastafile.fetch(args.chr,pileupcolumn.reference_pos ,pileupcolumn.reference_pos +1), 
                  pileupread.alignment.query_name,
                  pileupread.query_position,  
                  "*"))     
        if not pileupread.is_del and not pileupread.is_refskip and (pileupcolumn.pos >= args.start ) and (pileupcolumn.pos <= args.end  ):
           print ('%s\t%s\t%s\t%s\t%s\t%s' %
                 (args.chr,
                  pileupcolumn.pos +1,
                  fastafile.fetch(args.chr,pileupcolumn.reference_pos ,pileupcolumn.reference_pos +1),  
                  pileupread.alignment.query_name,
                  pileupread.query_position,  
                  pileupread.alignment.query_sequence[pileupread.query_position]))
samfile.close()
fastafile.close()
ADD REPLYlink modified 4.9 years ago by Devon Ryan98k • written 4.9 years ago by bernardo196330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1570 users visited in the last hour
_