Logic To Get Consensus Sequence

0

Entering edit mode

10.5 years ago

SRKR ▴ 180

I have a set of aligned sequences in fasta format. I want to get consensus out of the alignment. In case of most of the sites one of the base is showing maximum occurrence. In case of sites where two or more bases occur equal number of times, which base should be taken. An example is given below:

Seq_1: ATGCGA
Seq_2: AT-CGT
Seq_3: AT-CCG
Seq_4: AT-CCC
Seq_5: AA-CT-

As per the conventions this will be the consensus

Consensus : A T G C [G/C] N

But this output of the consensus sequence will throw an error when aligned with other sequences. So what should be done in such scenario and how to get consensus for such sites?

consensus genomics • 2.5k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 10.5 years ago by SRKR ▴ 180

2

Entering edit mode

Depending on what you want to do downstream, you might be able to use IUPAC codes, such as S for [G/C].

ADD REPLY • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

I can use IUPAC codes, but those are just being ignored by the application thus affecting the alignment. I am using MEGA 4.0. Also even if the application takes random base based on the letter, that would be technically a glitch.

ADD REPLY • link 10.5 years ago by SRKR ▴ 180

0

Entering edit mode

Ah, you should really update your question to mention MEGA 4.0 and the other details of exactly what you're doing. Otherwise, you'll only ever get a rather generic reply like mine. With more details, hopefully someone familiar with MEGA can provide some insight into this.

ADD REPLY • link 10.5 years ago by Devon Ryan 104k

Login before adding your answer.