Question: best value of lfc threshold
0
gravatar for rthapa
9 months ago by
rthapa0
rthapa0 wrote:

What is the best value to assign for lfc threshold while using DESeq2 package? With 1 as lfc threshold, I got more than 3000 upregulated genes. Any suggestion please? Thanks

rna-seq • 464 views
ADD COMMENTlink modified 9 months ago by Kevin Blighe30k • written 9 months ago by rthapa0
3
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe30k
Kevin Blighe30k wrote:

In DESeq2, the 'lfc' values are on the log [base 2] scale (log2fc)..

This is an open-ended question. Ask 100 people and you'll get very different answers.

  • Log2fc of 1 is equivalent to linear fold change of 2
  • Log2fc of 2 is equivalent to linear fold change of 4
  • Log2fc of 3 is equivalent to linear fold change of 8

Each person appears to choose a cut-off value that relates to whatever the first trusted person in their careers told them. The mistake that these people then make is in rigidly adhering to this cut-off and in thinking that it's the only answer. In some cases, people do not even use any cut-off for fold-change and just use adjusted P-values (Q values) and then rank the statistically significant genes based on fold-change. As I recall, the first trusted voice in my own career told me: 'FDR Q<0.05 and absolute log2fC>2', but that was during a time when RNA-seq was not even available.

There really is no answer, though, and it depends on many factors, including:

  • The normalisation type (with FPKM/RPKM, unrealistically large log2fc values will be observed; with quantile or geometric normalisation, as used in DESeq2, log2fc values will be lower than with FPKM counts and will be balanced between negative and positive fold-changes)
  • how many genes you want to include for downstream analysis
  • previous literature of how many transcripts to expect in such a comparison that you're conducting
  • the adjusted P value that you are using for cut-off. For example, even at FDR Q<0.05 and log2fc=2, many of the transcripts will not be that much different when you visualise the normalised counts between your comparisons (this comment only has validity in certain experimental setups though)
  • the variance of your data (high variance = unreliable log2fc values in any setting)

So, the message? - there is absolutely no standard cut-off. Use what is most appropriate for your data and what works best.

Kevin

ADD COMMENTlink modified 15 days ago • written 9 months ago by Kevin Blighe30k

sorry, why correlation between two samples goes two times higher when I perform geometric normalisation on my row counts? Is there any explanation please? I calculated Pearson correlation for two samples before and after normalisation wherein correlation went higher 2 times in normalised samples

ADD REPLYlink written 4 months ago by jivarajivaraj40
1

The correlation value may have changed, but does the statistical significance of the correlation change? Use cor.test to check.

A short answer, too: there are different normalisation methods out there and they will produce data on different distributions. It is logical that statistical inferences from different normalisations will also be different. What you must ensure is that you choose the normalisation strategy that is most suitable for your data.

ADD REPLYlink written 4 months ago by Kevin Blighe30k

you alright, I am facing with a data sets with too many zeros and genes with low read counts, in another hand dataset is heterogeneous of two dataset with different distributions.

ADD REPLYlink written 4 months ago by jivarajivaraj40
1

In that case, you may consider (prior to normalisation) removing transcripts that have a high rate of zeros across your sample cohort

ADD REPLYlink written 4 months ago by Kevin Blighe30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1937 users visited in the last hour