Hi, I would like to obtain a pileup (from a bam file) that also includes the read names. samtools mpileup command only gives the bases: samtools mpileup --positions ctg2.bed -f reference_genomic2.fasta reads.bam ctg2_left_rc 177692 C 53 ,,.,.,..,....,...,,,,.....,.............AAAAAAAAAaaaa ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]O]]]]]]]]]]]]]]]]]
ctg2_left_rc 177721 T 55 ,,,.,.,.,....,...,,,,....,.........*.CCCCCCCCCCcccc ]]]]]]]]]]]K]]]]]]]]]]]R]]]L]]]]]]]]]]]]]]]]]]]]N]]]]]]
Checking in the IGV browser, I saw that samtools mpileup correctly identified the deletions in the reads (with '*')
Since I need the read names and could not find a suitable command in samtools, I wrote a script in python, using pysam (I am a newbie in both). However, it is not correctly handling indels, and instead of reporting them with '*' as samtools mpileup, it gives the previous base. My script follows; any suggestion will be greatly appreciated.
the command line: samtools_view.py --bam reads.bam --fasta reference_genomic2.fasta --chr ctg2_left_rc --start 177721 --end 177722
the script:
#! /usr/local/bin/python2.7
import pysam
import argparse
parser = argparse.ArgumentParser(description='usage: python samtools_view.py --bam reads.bam --fasta reference_genomic.fasta --chr ctg2_right --start 100140 --end 100142 ')
parser.add_argument('--bam', help='Input bam file name',required=True)
parser.add_argument('--fasta', help='Input reference fasta file name',required=True)
parser.add_argument('--chr',help='target contig or chromosome', required=True)
parser.add_argument('--start',help='target region start (bp)', required=True)
parser.add_argument('--end',help='target region end (bp)', required=True)
args = parser.parse_args()
args.start = int(args.start)
args.end = int(args.end)
samfile = pysam.AlignmentFile(args.bam, "rb")
fastafile = pysam.FastaFile( args.fasta )
for pileupcolumn in samfile.pileup(args.chr, args.start, args.end):
for pileupread in pileupcolumn.pileups:
if (pileupcolumn.pos >= args.start) and (pileupcolumn.pos <= args.end):
print ('%s\t%s\t%s\t%s\t%s\t%s' %
(args.chr,
pileupcolumn.pos,
fastafile.fetch(args.chr,pileupcolumn.reference_pos -1,pileupcolumn.reference_pos), # -1 needed because pysam is 0-based and most programs (and human heads) are 1-based
pileupread.alignment.query_name,
pileupread.query_position,
pileupread.alignment.query_sequence[pileupread.query_position -1]))
samfile.close()
fastafile.close()
Thanks a lot, Devon ! I fixed the problem (and some other errors, due to 0-based vs 1-based coordinates) , and everything is fine now. I pasted below the code, for it may be useful to other people. Bernardo