Entering edit mode
                    15.1 years ago
        User 5217
        
    
        ▴
    
    40
    Hello, I have below 8 sequences and I would like to calculate a consensus sequence from them.
sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
             ['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
             ['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
            ]
for i in range(len(sequences[1])):
  alignment = ""
  for j in range(len(sequences)):
    alignment += sequences[j][i]
  print alignment
  print alignment.count("A")
  print alignment.count("C")
  print alignment.count("G")
  print alignment.count("T")
  print "----------"
The above code calculates to each position how often a base occurs (Position Frequency Matrix). I have found the following rules ( http://www.cisred.org/content/methods/help/pfm ) to calculate the consensus sequence, but unfortunataly I do not quite understand it yet to complete the implementation of consensus sequence.
Thank you in advance.
Best regards,
You should look at Brad's suggestion using Biopython in this question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?
Notes: If you want the length of the first sequence then you should use
len(sequences[0])instead of1. Without modifying the rest of the code, the sequences could be in string format"CCCATTGTTCTC". Cheers