Question: Assignment for SNPs
0
gravatar for arushipuri786
3.1 years ago by
arushipuri7860 wrote:

I am a beginner in bio and python. Following is my assignment A python program that take following command line inputs:

a)a filename expecting aligned sequencing data in form of a bam file)

b)a genomic coordinate expecting a SNP coordinate in the form of chr:position

c)a minimum quality score as an int value use pysam and investigate the following properties and present them as the output to the command line

d)How many reads in the input bam file a) overlap the genomic coordinate specified as the input value b)?

e)Which nucleotides (A,C,G,T,N) are supported by those reads and how many times each? Provide two values per base. The total number of reads that support the base and have a base quality of atleast c)?

f)What is the strand support of each base(what proportion of reads maps to the forward strand of the reference)? Again total and with the minimum base quality.

My code is below, I wanted to know if i am going right and how shall i proceed further. Thank you.

import pysam
import sys
import os

#Checking command line arguments
if len(sys.argv) < 4:
    print ("Files not provided")
    sys.exit(0)

#command line arguments sequence
bam_file= sys.argv[1]
chromosomecoordinate=sys.argv[2]
qualityscore=int(sys.argv[3])

try:
#bam file is 1 base, pysam is 0 base
  snipposition=int(chromosomecoordinate.split(':')[1])-1
#taking the first part now
  chromosomenumber=chromosomecoordinate.split(':')[0]
  print chromosomenumber
except IndexError:
  print"Error"

bamfile=pysam.AlignmentFile(bam_file,'rb')

#intialising a list and five count variables
base=[]


for aRead in bamfile.fetch(chromosomenumber,int(snipposition),int(snipposition+1)):    
  #print aRead
  start_pos=aRead.reference_start
  print start_pos

  a_count=0
  c_count=0
  g_count=0
  t_count=0
  n_count=0
  base=aRead.query_sequence
  #print base
  for i in base:
    if i=="A":
      a_count+=1  
    elif i=="C":
      c_count+=1
    elif i=="G":
      g_count+=1
    elif i=="T":
      t_count+=1
    elif i=="N":
      n_count+=1
  print(a_count,c_count,g_count,t_count,n_count)
snp • 911 views
ADD COMMENTlink modified 3.0 years ago by Emily_Ensembl18k • written 3.1 years ago by arushipuri7860
1

You could make your code slightly more elegant using collections.Counter()

(A dictionary object in which every value is initialized as a integer)

As such you could use something like the following:

counts = collections.Counter()
for i in base:
   counts[i] += 1
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by WouterDeCoster39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1229 users visited in the last hour