Question: Consensus Sequence
4
10.3 years ago by
User 521740
User 521740 wrote:

Hello, I have below 8 sequences and I would like to calculate a consensus sequence from them.

``````sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
]

for i in range(len(sequences[1])):
alignment = ""
for j in range(len(sequences)):
alignment += sequences[j][i]
print alignment
print alignment.count("A")
print alignment.count("C")
print alignment.count("G")
print alignment.count("T")
print "----------"
``````

The above code calculates to each position how often a base occurs (Position Frequency Matrix). I have found the following rules ( http://www.cisred.org/content/methods/help/pfm ) to calculate the consensus sequence, but unfortunataly I do not quite understand it yet to complete the implementation of consensus sequence.

Best regards,

consensus python biopython • 6.1k views
modified 10.0 years ago by brentp23k • written 10.3 years ago by User 521740
1

You should look at Brad's suggestion using Biopython in this question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?

Notes: If you want the length of the first sequence then you should use `len(sequences[0])` instead of `1`. Without modifying the rest of the code, the sequences could be in string format `"CCCATTGTTCTC"`. Cheers

8
10.3 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

Check out motility which does exactly that:

``````import motility
sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
]

pwm = motility.make_pwm(sequences)
print pwm.generate_sites_over(pwm.max_score())
``````

prints

``````('CCCATTGTTCTC', 'TCCATTGTTCTC')
``````