Considering gaps in calculating conservation score from MSA
0
1
Entering edit mode
10 months ago

Dear all,

I was looking for a good way to calculate conservation scores over columns in an MSA. I usually use Kullback-Leibler-Divergence (kl_divergence) or Shannon entropy. However, I would like to know if it makes sense to penalize gaps, when calculating conservation. And if so how could this be implemented. What I tried now is just a very simple score such as:

score = kl_divergence * (1 - gap_frequency) 

So I just use the gap_frequency to penalize columns with a high share of gaps in the alignment. However, I am unsure if this is, let's say, biologically meaningful to do. I could not find any good solution to this. Are there established methods to do this? In particular in combination with Shannon entropy, KL divergence or similar methods?

Any suggestion is appreciated!

Best, Jonathan

multiple alignment sequence conservation python • 322 views
ADD COMMENT

Login before adding your answer.

Traffic: 2309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6