How to determine consensus sequence and conservation score for each protein?
1
1
Entering edit mode
3.6 years ago
Kian ▴ 50

How i can determine to consensus sequence and conservation score " percent of conservation" for some protein? i have some genes and i want to determine these. i know in CLC and using create alignment can do it but i want another tools for this work. Thanks

protein consensus sequence conservation score • 3.2k views
0
Entering edit mode

If these are known proteins then you could take a look at homologene database at NCBI. That should give you pre-computed alignments of homologous sequences.

A blast search followed by extaction of the related sequences and a multiple sequence alignment (using Clustal omega, MAFTT, t-coffee, muscle) would get you the alignments/consensus.

0
Entering edit mode
3.6 years ago
Joe 19k

I wrote a pretty basic script which is just a wrapper to BioPython's dumb consensus method.

https://github.com/jrjhealey/bioinfo-tools/blob/master/Consensus.py

I'd look at maybe using hmmer (hmmemit) though if you want very sensitive consensus, though I think it tends to work better if there are more sequences in the mutliple alignment (but then again that's true for most MSAs).

0
Entering edit mode

Just to be clear the expected input for your script is a multiple-sequence alignment (MSA). So @KIan needs to provide that.

0
Entering edit mode

Yep, I assumed that's what they were getting at by saying they have "some sequences" and they are "using create alignment".

0
Entering edit mode

i want another tools for this work

Not sure if that wording indicates a desire to not use CLC (or a pre-existing alignment). We will have to wait for OP to clarify.

0
Entering edit mode

Thanks for responses, If i have 3 protein sequence, i should align them for consensus sequence? how about conservation score?

0
Entering edit mode

You will need to elaborate about what you mean by conservation score. Something like average global pairwise sequence identity?

0
Entering edit mode

May be yes, i want to comparing amino acid sequences of three protein that i have, and obtain the consensus sequence and generated homology models for these proteins.

0
Entering edit mode

If you align them with a tool like CLUSTAL Omega, it outputs scores for all the pairwise combinations which you could use? I have done that in the past. That would also generate a decent MSA which you can use to generate the consensus sequence.

0
Entering edit mode

I see the CLUSTAL Omega, it can align the sequence the amino acids of the proteins, i should get consensus sequence by common amino acids between 3 sequence? (if 2 amino acid is common alignment, it can be in consensus sequence)? and how i can calculate conservation score?

0
Entering edit mode

I just told you, Clustal outputs that information when you do the alignment itself.

You should really use more than 3 sequences if you want a decent consensus though, but yes, I suppose you could use 2 out of 3.

The problem you’ll have is what if all 3 sequences have different amino acids? This is why you should consider using a hmm for this.