Biopython intron-exon boundary nucleotide frequencies
Hello, I have two different arrays composing of 10 nucleotides in an intron and exon junction, and an exon intron junction that I got from a GenBank file. Now the arrays look simething like this:

Array1 = [Seq(ATTCATTCGG), Seq(GGCTAGATTG), Seq(CATGTAATGC)]


How do I calculate the frequency of each nucleotide in each position of the junction?

5 weeks ago
Joe 20k

Eukaryote genomics is not a area of expertise for me so apologies if I miss some subtlety with the exons/introns here, but if the tasks is as simple as it appears you should just be able to do something like this:

from collections import Counter
from Bio.Seq import Seq

Array1 = [Seq("ATTCATTCGG"), Seq("GGCTAGATTG"), Seq("CATGTAATGC")]

for i in zip(*Array1):
print(Counter(i))


Result:

Counter({'A': 1, 'G': 1, 'C': 1})
Counter({'T': 1, 'G': 1, 'A': 1})
Counter({'T': 2, 'C': 1})
Counter({'C': 1, 'T': 1, 'G': 1})
Counter({'A': 2, 'T': 1})
Counter({'T': 1, 'G': 1, 'A': 1})
Counter({'A': 2, 'T': 1})
Counter({'T': 2, 'C': 1})
Counter({'G': 2, 'T': 1})
Counter({'G': 2, 'C': 1})


This is just the printed representation, if you want to use the dicts produced by counter, you can simply assign them to a list or something.