Determining best MCL inflation factor
1
2
Entering edit mode
6.1 years ago
Anand Rao ▴ 440

I am trying to cluster orthologs (and paralogs) at the protein level. I seem to be getting groups that have very disparate proteins, because they are of very different lengths and their alignments returned by MAFFT are extremely gappy. So I am considering playing around with the Inflation factor of MCL.

Some info about that is at http://micans.org/mcl/man/mcl.html. "A good set of starting values is 1.4, 2, 4, and 6." While I understand, in theory, the effect changing inflation factor will have on the coarseness of clustering, how can I practically determine the best inflation factor for my dataset if I do not have any extensive information on it a priori? Any thoughts? Thank you!

MCL clustering MarkovChain Inflation • 6.1k views
ADD COMMENT
0
Entering edit mode

You may set more stringent blast thresholds as well.

ADD REPLY
2
Entering edit mode

OrthoMCL uses an inflation of around 1.5 to balance sensitivity and selectivity based on grouping of enzymes and their E.C. numbers.

ADD REPLY
0
Entering edit mode

Hi Anand, Could you find a solid method to identify the best inflation rate for your MCL clustering? I used BMGE for trimming and it somehow made the MSA file better and removed many gaps. But still, I'm missing many sequences within my orthogroups.

ADD REPLY
1
Entering edit mode
6.1 years ago

Increasing inflation will increase granularity, that is it will produce smaller clusters. So you need to use higher values than what you've used so far to try and break up the clusters into smaller, more homogeneous ones. Also you seem to have information to use to assess clustering quality since you can tell that your current clustering is not satisfactory.

ADD COMMENT
0
Entering edit mode

Thanks for your response but Nope, I said "I do not have any extensive information on it a priori". How do I then practically assess what the best inflation factor is. And if I should check more values....Hope that clarifies it.

ADD REPLY
0
Entering edit mode

What I meant is that you somehow can determine the clustering quality since you find that what you get is not good enough. If you could quantify this clustering quality then you could measure it for different values of inflation. Alternatively, depending on the cluster structure you're trying to extract, other clustering algorithms may be worth considering. In my hands, MCL tends to produce very unbalanced clusters so if that's also a problem for you, you should consider another algorithm.

ADD REPLY

Login before adding your answer.

Traffic: 2223 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6