Question: How to choose the threshold of co-expression for gene expression networks
1
gravatar for elb
24 months ago by
elb170
Torino
elb170 wrote:

Hi guys, I have a question regarding the co-expression networks. In particular I have a gene marker and a list of co-expressed genes based on the mutual information. This list is around of 1000 genes (neighbourhood). Of course this is a not manageable number of genes. Is there a way to choose the best or representative number of genes-neighbors according to a threshold of MI value for example? I tried to rank the genes from the highly correlating to the lowly correlating but at some point I have to stop and choose a final number of genes. Is there a way to choose a cut-off point that could be "robust". I have no idea because I know that it depends on the final goal but in my case no experiments are feasible with this huge number of genes.

Could you help me please?

Thanks in advance

ADD COMMENTlink modified 24 months ago by sandeep.amberkar1840 • written 24 months ago by elb170
1

You imply that follow-up experiments are the limiting step so why not rank the genes in a way that's relevant to the experiments/the question to address and take the top n with n being what is suitable for follow-up. Also you can use the old elbow rule trick: plot the relevant values in decreasing order and find if there's an elbow. In many real-life data, there is a sharp initial decrease followed by a flat part. The point, not always well defined, at which the curve flattens is usually a good practical cut-off point but that may still give you too many candidates to follow up.

ADD REPLYlink written 24 months ago by Jean-Karim Heriche21k

Thank you very much for for answer. The problem is always the same...there's not a clear question and to make inference was asked...

ADD REPLYlink written 24 months ago by elb170
1

If you have the input expression dataset that was used to compute MI, you may want to consider doing a randomization test to estimate the type I error rate (false positive rate) as a function of MI threshold. You'd recompute MI in randomly re-assorted input data sets, to determine what the false positive rate is at a given MI or correlation cutoff under the null hypothesis of no associations among expression profiles. You would then pick a cutoff that has a low enough false positive rate to satisfy your application. In large data sets, high MI values will occur by chance, and as data set size increases the false positive rate at any given fixed cutoff MI value can become larger. This approach says nothing about biological significance, but would control your false positive rate.

ADD REPLYlink written 24 months ago by Ahill1.8k

Thank you Ahill. Finally I performed the randomization that seems to ben the only one satisfying criteria to choose a threshold that at the end is a compromise between false positive and false negative findings.

ADD REPLYlink written 24 months ago by elb170
0
gravatar for sandeep.amberkar18
24 months ago by
sandeep.amberkar1840 wrote:

Broadly what is it that you wish to achieve?

There is no explanation for choosing a cutoff, be it of correlation or p-value. However for Spearman correlation there have been widely accepted ranges

  1. Low correlation = 0.2 ~ 0.4
  2. Med correlation = 0.4 ~ 0.6
  3. High correlation = 0.7 ~ 0.9

Perhaps you should look into - WGCNA which is popular to determine coexpression modules from transcriptomic data. Which brings back to the original question - what do you want to achieve at the end of it? If you could be more specific, I could help you more.

ADD COMMENTlink written 24 months ago by sandeep.amberkar1840
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1763 users visited in the last hour