I'm trying to write a python script, given a fasta, with multiple records, to count the number of matches and mismatches, excluding gaps. The fasta file was created using clustalw2 through python. I have the following, which will output the number of matches, mismatches, and gaps, given only two sequences in a list. However, my fasta file contains 7 sequences. I need help formatting this into a function to print out the number for all pairwise alignments.
The following code works if there are only two records in the fasta file. To compare the 7 sequences, I would need it to do 1:1, 1:2, 1:3 ..and so on. and again for 2:2, 2:3, 2:4, and so on. When i replace seqs and seqs with any other number say 2, 4 to compare those two items in the list, the code does not work.
for record in SeqIO.parse("ex.fasta", "fasta"): rec_seq = record.seq seqs.append(list(rec_seq)) matchstr = '' for (i, base) in enumerate(seqs): base = seqs[i] for j in range(1,len(seqs)): if base == '-': base = 'G' elif seqs[i] == '-': base = 'G' elif seqs[i] == '-': base = 'G' elif base!=seqs[j][i]: base = 'X' elif base == seqs[j][i]: base = 'Y' matchstr = matchstr + base print matchstr D = matchstr.count('X') print D S = matchstr.count('Y') print S G = matchstr.count('G') print G