Question: Consensus Sequence
4
gravatar for User 5217
10.3 years ago by
User 521740
User 521740 wrote:

Hello, I have below 8 sequences and I would like to calculate a consensus sequence from them.

sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
             ['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
             ['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
            ]

for i in range(len(sequences[1])):
  alignment = ""
  for j in range(len(sequences)):
    alignment += sequences[j][i]
  print alignment
  print alignment.count("A")
  print alignment.count("C")
  print alignment.count("G")
  print alignment.count("T")
  print "----------"

The above code calculates to each position how often a base occurs (Position Frequency Matrix). I have found the following rules ( http://www.cisred.org/content/methods/help/pfm ) to calculate the consensus sequence, but unfortunataly I do not quite understand it yet to complete the implementation of consensus sequence.

Thank you in advance.

Best regards,

consensus python biopython • 6.1k views
ADD COMMENTlink modified 10.0 years ago by brentp23k • written 10.3 years ago by User 521740
1

You should look at Brad's suggestion using Biopython in this question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?

ADD REPLYlink modified 16 months ago by _r_am32k • written 10.3 years ago by Eric Normandeau10k

Notes: If you want the length of the first sequence then you should use len(sequences[0]) instead of 1. Without modifying the rest of the code, the sequences could be in string format "CCCATTGTTCTC". Cheers

ADD REPLYlink modified 16 months ago by _r_am32k • written 10.3 years ago by Eric Normandeau10k
8
gravatar for brentp
10.3 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

Check out motility which does exactly that:

import motility
sequences = [['C', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'T', 'T', 'C', 'T', 'G', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'A', 'G'],
             ['C', 'T', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'G', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['C', 'C', 'T', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'T', 'C'],
             ['T', 'C', 'C', 'A', 'T', 'T', 'G', 'T', 'T', 'C', 'G', 'T'],
             ['C', 'C', 'A', 'A', 'T', 'T', 'G', 'T', 'T', 'T', 'T', 'G']
            ]

pwm = motility.make_pwm(sequences)
print pwm.generate_sites_over(pwm.max_score())

prints

('CCCATTGTTCTC', 'TCCATTGTTCTC')
ADD COMMENTlink modified 16 months ago by _r_am32k • written 10.3 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1662 users visited in the last hour