Question: How to get the consensus sequence and the possible sequences from multiple sequence alignment.
0
gravatar for wangyu.ashley
4 weeks ago by
wangyu.ashley0 wrote:

I have a multiple sequence alignment file like that,

   >seq A
   AAACTCAGCTACG
   >seq B
   AAACACTGCTATG
   >seq C
   AAAGACTGCTATC

And I want generate two sequence from the input file,

   > consensus
   AAACACTGCTATG
   >Alt
   AAAGTCAGCTACC

Are there any software can be used to achieve this task? Any code would be much appreciated! Thank you.

snp alignment • 138 views
ADD COMMENTlink modified 4 weeks ago by h.mon30k • written 4 weeks ago by wangyu.ashley0

biopython's AlignIO has consensus sequence functionality if you are providing alignments (or sequences which are already the same length).

The alt is a bit more difficult, I don't know of any software personally that could produce exactly what you need, so some custom code is probably the way to go.

How are you proposing the alt's be generated? Do you want an alt sequence for every possible combination of the variant positions? This will get unwieldy very quickly...

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Joe17k

Thanks Joe. I want to directly detect how the SNP change in this group of gene family. The consensus&alt sequence can represent the summary of SNPs and then use this two sequence to calculate the Ka/Ks.

ADD REPLYlink written 4 weeks ago by wangyu.ashley0

Do you always only have 3 input sequences? What if there is more than 2 variants for a given position - how do you intend to summarise that position?

ADD REPLYlink written 4 weeks ago by Joe17k

Not only three input sequences, but most of the groups only have 2 variants.

ADD REPLYlink written 4 weeks ago by wangyu.ashley0

OK, but what do you want to do with the subset which have more than 2? This will radically change the code the task needs.

ADD REPLYlink written 4 weeks ago by Joe17k

I will go for keep the one which is occur more frequently in this position.

ADD REPLYlink written 4 weeks ago by wangyu.ashley0
0
gravatar for Mensur Dlakic
4 weeks ago by
Mensur Dlakic6.0k
USA
Mensur Dlakic6.0k wrote:

You will need to have a local installation of HMMer. If your alignment is in aln:

hmmbuild --dna aln.hmm aln

Now you can create a consensus sequence:

hmmemit -c aln.hmm

This will print:

>aln-consensus
AAACACTGCTATG

You can sample a random sequence from this model, but in general this will not give you exactly 13 bases like in your alignment, even if you set the expected length from profile to 13:

hmmemit -p -L 13 aln.hmm

It will produce a different output each time.

ADD COMMENTlink written 4 weeks ago by Mensur Dlakic6.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour