I am a beginner in bio and python. Following is my assignment A python program that take following command line inputs:
a)a filename expecting aligned sequencing data in form of a bam file)
b)a genomic coordinate expecting a SNP coordinate in the form of chr:position
c)a minimum quality score as an int value use pysam and investigate the following properties and present them as the output to the command line
d)How many reads in the input bam file a) overlap the genomic coordinate specified as the input value b)?
e)Which nucleotides (A,C,G,T,N) are supported by those reads and how many times each? Provide two values per base. The total number of reads that support the base and have a base quality of atleast c)?
f)What is the strand support of each base(what proportion of reads maps to the forward strand of the reference)? Again total and with the minimum base quality.
My code is below, I wanted to know if i am going right and how shall i proceed further. Thank you.
import pysam
import sys
import os
#Checking command line arguments
if len(sys.argv) < 4:
print ("Files not provided")
sys.exit(0)
#command line arguments sequence
bam_file= sys.argv[1]
chromosomecoordinate=sys.argv[2]
qualityscore=int(sys.argv[3])
try:
#bam file is 1 base, pysam is 0 base
snipposition=int(chromosomecoordinate.split(':')[1])-1
#taking the first part now
chromosomenumber=chromosomecoordinate.split(':')[0]
print chromosomenumber
except IndexError:
print"Error"
bamfile=pysam.AlignmentFile(bam_file,'rb')
#intialising a list and five count variables
base=[]
for aRead in bamfile.fetch(chromosomenumber,int(snipposition),int(snipposition+1)):
#print aRead
start_pos=aRead.reference_start
print start_pos
a_count=0
c_count=0
g_count=0
t_count=0
n_count=0
base=aRead.query_sequence
#print base
for i in base:
if i=="A":
a_count+=1
elif i=="C":
c_count+=1
elif i=="G":
g_count+=1
elif i=="T":
t_count+=1
elif i=="N":
n_count+=1
print(a_count,c_count,g_count,t_count,n_count)
You could make your code slightly more elegant using collections.Counter()
(A dictionary object in which every value is initialized as a integer)
As such you could use something like the following: