You could use the seqinr package for R. First read in your alignment with the read.alignment() function then use the consensus() function with method="profile" to get a matrix of counts of each nucleotide at each position.
Another similar option would be with BioPython, to extract a PSSM (Position Specific Score Matrix):
from Bio import SeqIO from Bio.Align import AlignInfo align = AlignIO.read('~/path/to/alignment.aln', 'format') # Whatever format your mutliple sequence alignment is in summary_align = AlignInfo.SummaryInfo(align) consensus = summary_align.dumb_consensus() my_pssm = summary_align.pos_specific_score_matrix(consensus,chars_to_ignore = ['N']) print(my_pssm)
Should end up looking something like:
G A T C G 1 1 0 1 T 0 0 3 0 A 1 1 0 0 T 0 0 2 0 C 0 0 0 3
Instructions from: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc301
Following method suggested by Healey,
I really like to output the PSSM ouput with frequency in each cell as you mentioned. However, I follow your method above, the method gives me something like: 0 7.0 0 7.0 0 0 0 7.0 in the cells ect ... within the output. It seems to be threshold 70%. But I really want actual frequencies.
Could anyone share how to output the actual frequencies of alleles within each cell, assuming that the left side is position of an alignment?
Highly appreciate you suggestions. Thank you.