How to determine consensus sequence and conservation score for each protein?
3.6 years ago
Kian ▴ 50

How i can determine to consensus sequence and conservation score " percent of conservation" for some protein? i have some genes and i want to determine these. i know in CLC and using create alignment can do it but i want another tools for this work. Thanks

protein consensus sequence conservation score • 3.2k views
If these are known proteins then you could take a look at homologene database at NCBI. That should give you pre-computed alignments of homologous sequences.

A blast search followed by extaction of the related sequences and a multiple sequence alignment (using Clustal omega, MAFTT, t-coffee, muscle) would get you the alignments/consensus.

3.6 years ago
Joe 19k

I wrote a pretty basic script which is just a wrapper to BioPython's dumb consensus method.

https://github.com/jrjhealey/bioinfo-tools/blob/master/Consensus.py

I'd look at maybe using hmmer (hmmemit) though if you want very sensitive consensus, though I think it tends to work better if there are more sequences in the mutliple alignment (but then again that's true for most MSAs).

Just to be clear the expected input for your script is a multiple-sequence alignment (MSA). So @KIan needs to provide that.

Yep, I assumed that's what they were getting at by saying they have "some sequences" and they are "using create alignment".

i want another tools for this work

Not sure if that wording indicates a desire to not use CLC (or a pre-existing alignment). We will have to wait for OP to clarify.

Thanks for responses, If i have 3 protein sequence, i should align them for consensus sequence? how about conservation score?

You will need to elaborate about what you mean by conservation score. Something like average global pairwise sequence identity?

May be yes, i want to comparing amino acid sequences of three protein that i have, and obtain the consensus sequence and generated homology models for these proteins.

If you align them with a tool like CLUSTAL Omega, it outputs scores for all the pairwise combinations which you could use? I have done that in the past. That would also generate a decent MSA which you can use to generate the consensus sequence.

I see the CLUSTAL Omega, it can align the sequence the amino acids of the proteins, i should get consensus sequence by common amino acids between 3 sequence? (if 2 amino acid is common alignment, it can be in consensus sequence)? and how i can calculate conservation score?

I just told you, Clustal outputs that information when you do the alignment itself.

You should really use more than 3 sequences if you want a decent consensus though, but yes, I suppose you could use 2 out of 3.

The problem you’ll have is what if all 3 sequences have different amino acids? This is why you should consider using a hmm for this.