Question

measure of residue dissimilarity in aligment (Phylogenetic discriminant)

0

Entering edit mode

5.5 years ago

michau ▴ 60

Hi, Is there any probabilistic measure of position dissimilarity from rest of alignment? ie. method to assess which residues are responsible for the phylogenetic and functional differences.

I have alignment of ATP synthases, and I noticed that Mycobacterium is highly divergent from other Bacteria or even other Actinobacteria. I need some measure to statistically discriminate which residues are responsible for this divergence.

Any thoughts?

Thanks in advance

alignment sequence Phylogenetics • 909 views

ADD COMMENT • link 5.5 years ago by michau ▴ 60

1

Entering edit mode

I'm not sure if I exactly understand what you're after, but here goes:

You might be interested in calculating the Shannon Entropy per column of your Sequence alignment. High entropy positions will be your more divergent ones see for example: https://gist.github.com/jrjhealey/130d4efc6260dd76821edc8a41d45b6a.

You may need to take this further and do a dN/dS analysis or similar, since it probably won't be enough to just determine sites that are variable. You will need to demonstrate that they are causing meaningful selection (i.e. non synonymous).

ADD REPLY • link 5.5 years ago by Joe 21k

0

Entering edit mode

Firstly: Thank you for answer

I was thinking to use dn/ds as a next step, but as far as I know it allows only for pairwise comparisons. I was looking for something more like measuring inside clade vs. outside clade (site specific) variation. Column-wide, like:

for each column:
    MANOVA where 
        independent var= inside / outside 
        dependent var  = freqs of aminoacids in a column.

Or am I thinking bullshit? (I started my bioinformatical adventure recently → I'm still green as a lime and trying to learn) Nevertheless I will go with dn/ds as it will answer my question.

ADD REPLY • link 5.5 years ago by michau ▴ 60

1

Entering edit mode

I don’t know enough about the stats to speak to whether a manova approach would work.

The only limitation that strikes me with that approach, however, is that objectively clustering ‘clades’ is a very difficult problem. It’s often much more obvious to a person than to a computer.

I would think there are approaches which will work in a non pairwise fashion. A quick bit of googling bought this up, which sounds like it might fit the bill?

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887424/

ADD REPLY • link 5.5 years ago by Joe 21k