Considering gaps in calculating conservation score from MSA
Entering edit mode
4 months ago

Dear all,

I was looking for a good way to calculate conservation scores over columns in an MSA. I usually use Kullback-Leibler-Divergence (kl_divergence) or Shannon entropy. However, I would like to know if it makes sense to penalize gaps, when calculating conservation. And if so how could this be implemented. What I tried now is just a very simple score such as:

score = kl_divergence * (1 - gap_frequency) 

So I just use the gap_frequency to penalize columns with a high share of gaps in the alignment. However, I am unsure if this is, let's say, biologically meaningful to do. I could not find any good solution to this. Are there established methods to do this? In particular in combination with Shannon entropy, KL divergence or similar methods?

Any suggestion is appreciated!

Best, Jonathan

multiple alignment sequence conservation python • 174 views

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6