measure of residue dissimilarity in aligment (Phylogenetic discriminant)
0
0
Entering edit mode
5.5 years ago
michau ▴ 60

Hi, Is there any probabilistic measure of position dissimilarity from rest of alignment? ie. method to assess which residues are responsible for the phylogenetic and functional differences.

I have alignment of ATP synthases, and I noticed that Mycobacterium is highly divergent from other Bacteria or even other Actinobacteria. I need some measure to statistically discriminate which residues are responsible for this divergence.

Any thoughts?

Thanks in advance

alignment sequence Phylogenetics • 909 views
ADD COMMENT
1
Entering edit mode

I'm not sure if I exactly understand what you're after, but here goes:

You might be interested in calculating the Shannon Entropy per column of your Sequence alignment. High entropy positions will be your more divergent ones see for example: https://gist.github.com/jrjhealey/130d4efc6260dd76821edc8a41d45b6a.

You may need to take this further and do a dN/dS analysis or similar, since it probably won't be enough to just determine sites that are variable. You will need to demonstrate that they are causing meaningful selection (i.e. non synonymous).

ADD REPLY
0
Entering edit mode

Firstly: Thank you for answer

I was thinking to use dn/ds as a next step, but as far as I know it allows only for pairwise comparisons. I was looking for something more like measuring inside clade vs. outside clade (site specific) variation. Column-wide, like:

for each column:
    MANOVA where 
        independent var= inside / outside 
        dependent var  = freqs of aminoacids in a column.

Or am I thinking bullshit? (I started my bioinformatical adventure recently → I'm still green as a lime and trying to learn) Nevertheless I will go with dn/ds as it will answer my question.

ADD REPLY
1
Entering edit mode

I don’t know enough about the stats to speak to whether a manova approach would work.

The only limitation that strikes me with that approach, however, is that objectively clustering ‘clades’ is a very difficult problem. It’s often much more obvious to a person than to a computer.

I would think there are approaches which will work in a non pairwise fashion. A quick bit of googling bought this up, which sounds like it might fit the bill?

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887424/

ADD REPLY

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6