Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?
4
7
Entering edit mode
13.9 years ago
Dave Lunt ★ 2.0k

Problem: I have a multiple alignment of sequence reads. Each individual is represented by two sequences, forward and reverse (e.g. individual1_For and individual1_Rev) that don't overlap at the ends. I want my alignment to have one sequence per individual that is the consensus of the forward and reverse sequences. NB I don't want to create a consensus across all individuals. I want to recognise sequences that share a name e.g. "individual1" and replace them with a full-length consensus of those two sequences.

Does anybody know of existing script or programs to do this? If it doesn't already exist I shall probably start putting something together with BioPerl, any suggestions of best approach?

1
Entering edit mode

what is the main reason to do this? I mean you are going to use the output for? I ask this because you can use a logo to display the consensus.

0
Entering edit mode

what does it means to calculate the consensus between the forward and reversed sequence? Could you make an example of your input and desired output?

0
Entering edit mode
forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA

0
Entering edit mode
forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA

9
Entering edit mode
13.9 years ago

I have had to do something very similar recently.

The approach to solving your problem requires 3 steps. Here is my take:

• Use Python with Biopython to pass through your sequence couples, one individual at a time. Reverse complement the second sequence. (use any other preferred approach)
• Use the "dialign2" program repeatedly for each individual to create alignments. In this way, your 2 sequences will have the same length (with "-" characters to complete the missing parts).
• Use Python to pass through each alignment (2 sequences each time) and create a consensus. This should be easy.

Here is an attempt for the third part:

def create_consensus(seq1, seq2):
"""Sequences must be strings, have the same length, and be aligned"""
out_seq = ""
for i, nucleotide in enumerate(seq1):
couple = [nucleotide, seq2[i]]
if couple[0] == "-":
out_seq += couple[1]
elif couple[1] == "-":
out_seq += couple[0]
elif couple[0] == couple[1]:
out_seq += couple[0]
elif not couple[0] == couple[1]:
out_seq += "N"
return out_seq

seq1 = "-----CAGATACGCGCACATAGACATAG"
seq2 = "ACTGACAGATACAGACACATA-------"

print seq1
print seq2
print create_consensus(seq1, seq2)


The printed result is:

-----CAGATACGCGCACATAGACATAG
ACTGACAGATACAGACACATA-------
ACTGACAGATACNNNCACATAGACATAG


This is admittedly quickly done and there may be both 1) mistakes or special cases 2) a better way to do this, maybe with Biopython.

Hope it helps!

Cheers

0
Entering edit mode

Thanks Eric, very useful

8
Entering edit mode
13.8 years ago

Following up on Eric's useful answer, there is functionality to do this in Biopython. The Tutorial provides details on using it: http://biopython.org/DIST/docs/tutorial/Tutorial.html

• Chapter 6 deals with reading in alignment objects
• Chapter 14.3 provides useful functionality once you have an alignment object.

Specifically the function "dumb_consensus" calculates a consensus along the lines of what you are doing with create_consensus, with a bit of extra functionality like defining when you call something ambiguous. Source code for that is here:

http://github.com/biopython/biopython/blob/master/Bio/Align/AlignInfo.py

2
Entering edit mode
13.7 years ago

I worked on a solution to this that is available here - https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

Cheers,
Steve

0
Entering edit mode

hi, Steve Moss, I cant not connect to that link, could you please send me that script, thank you!

0
Entering edit mode

Hi! My apologies... I moved it over to GitHub... I've edited the post accordingly, but please now go here https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

1
Entering edit mode
12.7 years ago

Pagan has a consensus option that does what you need: http://www.ebi.ac.uk/~ari/pagan/

0
Entering edit mode

Thanks, looks interesting, I'll have to check that out