Question: Determining best MCL inflation factor
1
gravatar for Anand Rao
3.4 years ago by
Anand Rao210
United States
Anand Rao210 wrote:

I am trying to cluster orthologs (and paralogs) at the protein level. I seem to be getting groups that have very disparate proteins, because they are of very different lengths and their alignments returned by MAFFT are extremely gappy. So I am considering playing around with the Inflation factor of MCL.

Some info about that is at http://micans.org/mcl/man/mcl.html. "A good set of starting values is 1.4, 2, 4, and 6." While I understand, in theory, the effect changing inflation factor will have on the coarseness of clustering, how can I practically determine the best inflation factor for my dataset if I do not have any extensive information on it a priori? Any thoughts? Thank you!

ADD COMMENTlink modified 3.4 years ago by Jean-Karim Heriche18k • written 3.4 years ago by Anand Rao210

You may set more stringent blast thresholds as well.

ADD REPLYlink written 3.4 years ago by h.mon24k
2

OrthoMCL uses an inflation of around 1.5 to balance sensitivity and selectivity based on grouping of enzymes and their E.C. numbers.

ADD REPLYlink written 3.4 years ago by a.zielezinski8.6k
1
gravatar for Jean-Karim Heriche
3.4 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

Increasing inflation will increase granularity, that is it will produce smaller clusters. So you need to use higher values than what you've used so far to try and break up the clusters into smaller, more homogeneous ones. Also you seem to have information to use to assess clustering quality since you can tell that your current clustering is not satisfactory.

ADD COMMENTlink written 3.4 years ago by Jean-Karim Heriche18k

Thanks for your response but Nope, I said "I do not have any extensive information on it a priori". How do I then practically assess what the best inflation factor is. And if I should check more values....Hope that clarifies it.

ADD REPLYlink written 3.4 years ago by Anand Rao210

What I meant is that you somehow can determine the clustering quality since you find that what you get is not good enough. If you could quantify this clustering quality then you could measure it for different values of inflation. Alternatively, depending on the cluster structure you're trying to extract, other clustering algorithms may be worth considering. In my hands, MCL tends to produce very unbalanced clusters so if that's also a problem for you, you should consider another algorithm.

ADD REPLYlink written 3.4 years ago by Jean-Karim Heriche18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1173 users visited in the last hour