Question: Create Consensus Sequences For Sequence Pairs Within A Multiple Alignment?
7
gravatar for Dave Lunt
8.1 years ago by
Dave Lunt2.0k
Hull, UK
Dave Lunt2.0k wrote:

Problem: I have a multiple alignment of sequence reads. Each individual is represented by two sequences, forward and reverse (e.g. individual1For and individual1Rev) that don't overlap at the ends. I want my alignment to have one sequence per individual that is the consensus of the forward and reverse sequences. NB I don't want to create a consensus across all individuals. I want to recognise sequences that share a name e.g. "individual1" and replace them with a full-length consensus of those two sequences.

Does anybody know of existing script or programs to do this? If it doesn't already exist I shall probably start putting something together with BioPerl, any suggestions of best approach?

ADD COMMENTlink modified 8.1 years ago by 2184687-1231-83-4.9k • written 8.1 years ago by Dave Lunt2.0k
1

what is the main reason to do this? I mean you are going to use the output for? I ask this because you can use a logo to display the consensus.

ADD REPLYlink written 8.1 years ago by Paulo Nuin3.7k

what does it means to calculate the consensus between the forward and reversed sequence? Could you make an example of your input and desired output?

ADD REPLYlink written 8.1 years ago by Giovanni M Dall'Olio26k

forward: ACTGGACTAGACTACA---- reverse: -----ACTAGTCTACAGCTA consens: ACTGGACTAGNCTACAGCTA

ADD REPLYlink written 8.1 years ago by Eric Normandeau9.8k

forward: ACTGGACTAGACTACA----[?]reverse: -----ACTAGTCTACAGCTA[?]consens: ACTGGACTAGNCTACAGCTA

ADD REPLYlink written 8.1 years ago by Eric Normandeau9.8k

aaa

dff

fgjsdghsdgh

ADD REPLYlink written 8.1 years ago by Eric Normandeau9.8k
9
gravatar for Eric Normandeau
8.1 years ago by
Eric Normandeau9.8k
Quebec, Canada
Eric Normandeau9.8k wrote:

I have had to do something very similar recently.

The approach to solving your problem requires 3 steps. Here is my take:

  • Use Python with Biopython to pass through your sequence couples, one individual at a time. Reverse complement the second sequence. (use any other preferred approach)
  • Use the "dialign2" program repeatedly for each individual to create alignments. In this way, your 2 sequences will have the same length (with "-" characters to complete the missing parts).
  • Use Python to pass through each alignment (2 sequences each time) and create a consensus. This should be easy.

Here is an attempt for the third part:

def create_consensus(seq1, seq2):
    """Sequences must be strings, have the same length, and be aligned"""
    out_seq = ""
    for i, nucleotide in enumerate(seq1):
        couple = [nucleotide, seq2[i]]
        if couple[0] == "-":
            out_seq += couple[1]
        elif couple[1] == "-":
            out_seq += couple[0]
        elif couple[0] == couple[1]:
            out_seq += couple[0]
        elif not couple[0] == couple[1]:
            out_seq += "N"
    return out_seq

seq1 = "-----CAGATACGCGCACATAGACATAG"
seq2 = "ACTGACAGATACAGACACATA-------"

print seq1
print seq2
print create_consensus(seq1, seq2)

The printed result is:

-----CAGATACGCGCACATAGACATAG
ACTGACAGATACAGACACATA-------
ACTGACAGATACNNNCACATAGACATAG

This is admittedly quickly done and there may be both 1) mistakes or special cases 2) a better way to do this, maybe with Biopython.

Hope it helps!

Cheers

ADD COMMENTlink modified 8.1 years ago • written 8.1 years ago by Eric Normandeau9.8k

Thanks Eric, very useful

ADD REPLYlink written 8.1 years ago by Dave Lunt2.0k
8
gravatar for Brad Chapman
8.1 years ago by
Brad Chapman9.2k
Boston, MA
Brad Chapman9.2k wrote:

Following up on Eric's useful answer, there is functionality to do this in Biopython. The Tutorial provides details on using it:

http://biopython.org/DIST/docs/tutorial/Tutorial.html

  • Chapter 6 deals with reading in alignment objects
  • Chapter 14.3 provides useful functionality once you have an alignment object.

Specifically the function "dumb_consensus" calculates a consensus along the lines of what you are doing with create_consensus, with a bit of extra functionality like defining when you call something ambiguous. Source code for that is here:

http://github.com/biopython/biopython/blob/master/Bio/Align/AlignInfo.py

ADD COMMENTlink written 8.1 years ago by Brad Chapman9.2k
2
gravatar for Steve Moss
7.9 years ago by
Steve Moss2.2k
United Kingdom
Steve Moss2.2k wrote:

I worked on a solution to this that is available here - https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py.

Cheers,

Steve

ADD COMMENTlink modified 5.5 years ago • written 7.9 years ago by Steve Moss2.2k

hi, Steve Moss, I cant not connect to that link, could you please send me that script, thank you!

ADD REPLYlink written 5.5 years ago by litiancheng.gansu10

Hi! My apologies... I moved it over to GitHub... I've edited the post accordingly, but please now go here https://raw.github.com/gawbul/bioinformatics-scripts/master/FRconsensus.py

ADD REPLYlink written 5.5 years ago by Steve Moss2.2k
1
gravatar for 2184687-1231-83-
7.0 years ago by
2184687-1231-83-4.9k wrote:

Pagan has a consensus option that does what you need:
http://www.ebi.ac.uk/~ari/pagan/

ADD COMMENTlink written 7.0 years ago by 2184687-1231-83-4.9k

Thanks, looks interesting, I'll have to check that out

ADD REPLYlink written 7.0 years ago by Dave Lunt2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 588 users visited in the last hour