I'm trying to generate a position-weight matrix (PWM) from a Fasta file of aligned sequences. In order to do this, I need to count the occurence of a specific base at each position in the alignment. For example, given the following alignment:
ATGG TTGG ATTG CTTG
I would need to somehow observe the fact that, in the first column, there are 2 A's, 1 T, and 1 C; and repeat this for every position (column) in the file. At the moment, I'm inclined to read each line, base by base, into a multidimensional array, and parse it by position. However, that seems like it could get computationally demanding if we get large sequences. Has anyone come across this before and come up with a more streamlined approach? I'm coding it in Perl, but if you have something useful in another language, I'll take it.