Question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?
7
9.4 years ago by
Dave Lunt2.0k
Hull, UK
Dave Lunt2.0k wrote:

Problem: I have a multiple alignment of sequence reads. Each individual is represented by two sequences, forward and reverse (e.g. individual1For and individual1Rev) that don't overlap at the ends. I want my alignment to have one sequence per individual that is the consensus of the forward and reverse sequences. NB I don't want to create a consensus across all individuals. I want to recognise sequences that share a name e.g. "individual1" and replace them with a full-length consensus of those two sequences.

Does anybody know of existing script or programs to do this? If it doesn't already exist I shall probably start putting something together with BioPerl, any suggestions of best approach?

modified 9.3 years ago by 2184687-1231-83-5.0k • written 9.4 years ago by Dave Lunt2.0k
1

what is the main reason to do this? I mean you are going to use the output for? I ask this because you can use a logo to display the consensus.

what does it means to calculate the consensus between the forward and reversed sequence? Could you make an example of your input and desired output?

``````forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA
``````
``````forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA
``````
9
9.4 years ago by
Eric Normandeau10k wrote:

I have had to do something very similar recently.

The approach to solving your problem requires 3 steps. Here is my take:

• Use Python with Biopython to pass through your sequence couples, one individual at a time. Reverse complement the second sequence. (use any other preferred approach)
• Use the "dialign2" program repeatedly for each individual to create alignments. In this way, your 2 sequences will have the same length (with "-" characters to complete the missing parts).
• Use Python to pass through each alignment (2 sequences each time) and create a consensus. This should be easy.

Here is an attempt for the third part:

``````def create_consensus(seq1, seq2):
"""Sequences must be strings, have the same length, and be aligned"""
out_seq = ""
for i, nucleotide in enumerate(seq1):
couple = [nucleotide, seq2[i]]
if couple[0] == "-":
out_seq += couple[1]
elif couple[1] == "-":
out_seq += couple[0]
elif couple[0] == couple[1]:
out_seq += couple[0]
elif not couple[0] == couple[1]:
out_seq += "N"
return out_seq

seq1 = "-----CAGATACGCGCACATAGACATAG"
seq2 = "ACTGACAGATACAGACACATA-------"

print seq1
print seq2
print create_consensus(seq1, seq2)
``````

The printed result is:

``````-----CAGATACGCGCACATAGACATAG
ACTGACAGATACAGACACATA-------
ACTGACAGATACNNNCACATAGACATAG
``````

This is admittedly quickly done and there may be both 1) mistakes or special cases 2) a better way to do this, maybe with Biopython.

Hope it helps!

Cheers

Thanks Eric, very useful

8
9.3 years ago by
Boston, MA

Following up on Eric's useful answer, there is functionality to do this in Biopython. The Tutorial provides details on using it:

http://biopython.org/DIST/docs/tutorial/Tutorial.html

• Chapter 6 deals with reading in alignment objects
• Chapter 14.3 provides useful functionality once you have an alignment object.

Specifically the function "dumb_consensus" calculates a consensus along the lines of what you are doing with create_consensus, with a bit of extra functionality like defining when you call something ambiguous. Source code for that is here:

http://github.com/biopython/biopython/blob/master/Bio/Align/AlignInfo.py

2
9.2 years ago by
Steve Moss2.3k
United Kingdom
Steve Moss2.3k wrote:

I worked on a solution to this that is available here - https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py.

Cheers,

Steve

hi, Steve Moss, I cant not connect to that link, could you please send me that script, thank you!

Hi! My apologies... I moved it over to GitHub... I've edited the post accordingly, but please now go here https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

1
8.2 years ago by
2184687-1231-83-5.0k wrote:

Pagan has a consensus option that does what you need:
http://www.ebi.ac.uk/~ari/pagan/