Question: Consensus Sequence
4
gravatar for User 5217
9.3 years ago by
User 521740
User 521740 wrote:

Hello, I have below 8 sequences and I would like to calculate a consensus sequence from them.

sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
             ['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
             ['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
            ]

for i in range(len(sequences[1])):
  alignment = ""
  for j in range(len(sequences)):
    alignment += sequences[j][i]
  print alignment
  print alignment.count("A")
  print alignment.count("C")
  print alignment.count("G")
  print alignment.count("T")
  print "----------"

The above code calculates to each position how often a base occurs (Position Frequency Matrix). I have found the following rules ( http://www.cisred.org/content/methods/help/pfm ) to calculate the consensus sequence, but unfortunataly I do not quite understand it yet to complete the implementation of consensus sequence.

Thank you in advance.

Best regards,

consensus python biopython • 5.6k views
ADD COMMENTlink modified 9.0 years ago by brentp23k • written 9.3 years ago by User 521740
1

You should look at Brad's suggestion using Biopython in this question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?

ADD REPLYlink modified 4 months ago by RamRS25k • written 9.3 years ago by Eric Normandeau10k

Notes: If you want the length of the first sequence then you should use len(sequences[0]) instead of 1. Without modifying the rest of the code, the sequences could be in string format "CCCATTGTTCTC". Cheers

ADD REPLYlink modified 4 months ago by RamRS25k • written 9.3 years ago by Eric Normandeau10k
7
gravatar for brentp
9.3 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

Check out motility which does exactly that:

import motility
sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
             ['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
             ['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
            ]

pwm = motility.make_pwm(sequences)
print pwm.generate_sites_over(pwm.max_score())

prints

('CCCATTGTTCTC', 'TCCATTGTTCTC')
ADD COMMENTlink modified 4 months ago by RamRS25k • written 9.3 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1335 users visited in the last hour