Question: A3 count (count of A occurring in 3rd codon position) in multifasta file
0
6 months ago by
annushreekurmi0 wrote:

Hello everyone,

I am a novice in Python and I have been trying to compute the number of A's occurring in 3rd codon position in a multifasta file. I have written the following code but it does not work.

``````input_file = open('MG1655.FAS', 'r')
output_file = open('A3_counts.tsv','w')
output_file.write('Gene\tA3\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
A3count = 0
gene_name = cur_record.name
sequence = cur_record.seq
for i in range(0, len(sequence), 3):
if sequence[i:i+3]=='a':
A3count = A3count+1
output = '%s\t%i\t' % (gene_name, A3count)
output_file.write(output)
output_file.close()
input_file.close()
``````

sequence • 193 views
written 6 months ago by annushreekurmi0

Currently you're asking if the whole codon is `"a"` (why lowercase?), you need to ask `len(sequence[i:i+3]) == 3 and sequence[i:i+3][2].upper() == 'A'`.

Also, right now you're printing `A3count` each time you encounter a codon ending with 'A'.

Thanks for the reply. But it does not work. I also tried the following: DNA="ATGCCCGTG" start = 0 codons=[DNA[start:start+3]] for start in range(0,len(DNA),3): print(codons)

But the output I get is ATG, ATG, ATG Can you help me with this?

do

``````[DNA[start:start+3] for start in range(0, len(DNA) - 3, 3)]
``````

i.e. enclose the whole expression in square brackets, not just the `DNA[start:start+3]`