Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?
4
7
Entering edit mode
13.9 years ago
Dave Lunt ★ 2.0k

Problem: I have a multiple alignment of sequence reads. Each individual is represented by two sequences, forward and reverse (e.g. individual1_For and individual1_Rev) that don't overlap at the ends. I want my alignment to have one sequence per individual that is the consensus of the forward and reverse sequences. NB I don't want to create a consensus across all individuals. I want to recognise sequences that share a name e.g. "individual1" and replace them with a full-length consensus of those two sequences.

Does anybody know of existing script or programs to do this? If it doesn't already exist I shall probably start putting something together with BioPerl, any suggestions of best approach?

consensus phylogenetics fasta multiple-alignment • 15k views
ADD COMMENT
1
Entering edit mode

what is the main reason to do this? I mean you are going to use the output for? I ask this because you can use a logo to display the consensus.

ADD REPLY
0
Entering edit mode

what does it means to calculate the consensus between the forward and reversed sequence? Could you make an example of your input and desired output?

ADD REPLY
0
Entering edit mode
forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA
ADD REPLY
0
Entering edit mode
forward: ACTGGACTAGACTACA----
reverse: -----ACTAGTCTACAGCTA
consens: ACTGGACTAGNCTACAGCTA
ADD REPLY
9
Entering edit mode
13.9 years ago

I have had to do something very similar recently.

The approach to solving your problem requires 3 steps. Here is my take:

  • Use Python with Biopython to pass through your sequence couples, one individual at a time. Reverse complement the second sequence. (use any other preferred approach)
  • Use the "dialign2" program repeatedly for each individual to create alignments. In this way, your 2 sequences will have the same length (with "-" characters to complete the missing parts).
  • Use Python to pass through each alignment (2 sequences each time) and create a consensus. This should be easy.

Here is an attempt for the third part:

def create_consensus(seq1, seq2):
    """Sequences must be strings, have the same length, and be aligned"""
    out_seq = ""
    for i, nucleotide in enumerate(seq1):
        couple = [nucleotide, seq2[i]]
        if couple[0] == "-":
            out_seq += couple[1]
        elif couple[1] == "-":
            out_seq += couple[0]
        elif couple[0] == couple[1]:
            out_seq += couple[0]
        elif not couple[0] == couple[1]:
            out_seq += "N"
    return out_seq

seq1 = "-----CAGATACGCGCACATAGACATAG"
seq2 = "ACTGACAGATACAGACACATA-------"

print seq1
print seq2
print create_consensus(seq1, seq2)

The printed result is:

-----CAGATACGCGCACATAGACATAG
ACTGACAGATACAGACACATA-------
ACTGACAGATACNNNCACATAGACATAG

This is admittedly quickly done and there may be both 1) mistakes or special cases 2) a better way to do this, maybe with Biopython.

Hope it helps!

Cheers

ADD COMMENT
0
Entering edit mode

Thanks Eric, very useful

ADD REPLY
8
Entering edit mode
13.8 years ago

Following up on Eric's useful answer, there is functionality to do this in Biopython. The Tutorial provides details on using it: http://biopython.org/DIST/docs/tutorial/Tutorial.html

  • Chapter 6 deals with reading in alignment objects
  • Chapter 14.3 provides useful functionality once you have an alignment object.

Specifically the function "dumb_consensus" calculates a consensus along the lines of what you are doing with create_consensus, with a bit of extra functionality like defining when you call something ambiguous. Source code for that is here:

http://github.com/biopython/biopython/blob/master/Bio/Align/AlignInfo.py

ADD COMMENT
2
Entering edit mode
13.7 years ago

I worked on a solution to this that is available here - https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

Cheers,
Steve

ADD COMMENT
0
Entering edit mode

hi, Steve Moss, I cant not connect to that link, could you please send me that script, thank you!

ADD REPLY
0
Entering edit mode

Hi! My apologies... I moved it over to GitHub... I've edited the post accordingly, but please now go here https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

ADD REPLY
1
Entering edit mode
12.7 years ago

Pagan has a consensus option that does what you need: http://www.ebi.ac.uk/~ari/pagan/

ADD COMMENT
0
Entering edit mode

Thanks, looks interesting, I'll have to check that out

ADD REPLY

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6