A3 count (count of A occurring in 3rd codon position) in multifasta file
0
0
Entering edit mode
4.3 years ago

Hello everyone,

I am a novice in Python and I have been trying to compute the number of A's occurring in 3rd codon position in a multifasta file. I have written the following code but it does not work.

input_file = open('MG1655.FAS', 'r')
output_file = open('A3_counts.tsv','w')
output_file.write('Gene\tA3\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
A3count = 0
gene_name = cur_record.name
sequence = cur_record.seq
for i in range(0, len(sequence), 3):
if sequence[i:i+3]=='a':
A3count = A3count+1
output = '%s\t%i\t' % (gene_name, A3count)
output_file.write(output)
output_file.close()
input_file.close()


sequence • 1.0k views
0
Entering edit mode

Currently you're asking if the whole codon is "a" (why lowercase?), you need to ask len(sequence[i:i+3]) == 3 and sequence[i:i+3][2].upper() == 'A'.

0
Entering edit mode

Also, right now you're printing A3count each time you encounter a codon ending with 'A'.

0
Entering edit mode

Thanks for the reply. But it does not work. I also tried the following: DNA="ATGCCCGTG" start = 0 codons=[DNA[start:start+3]] for start in range(0,len(DNA),3): print(codons)

But the output I get is ATG, ATG, ATG Can you help me with this?

0
Entering edit mode

do

[DNA[start:start+3] for start in range(0, len(DNA) - 3, 3)]


i.e. enclose the whole expression in square brackets, not just the DNA[start:start+3]