Question

Differential Expression In Rna-Seq Experiment

1

Entering edit mode

10.6 years ago

ThePresident ▴ 80

Hello,

I'm dealing with a classical dilemma: I performed RNA-seq experiment on two biological replicates for condition A and two others for condition B. After alignment and differential expression analysis using DESeq package, I have a whole list of genes with fold changes of A vs B. Now, mu question is: where do I put a cutoff?

From a biological point of view, I'm tempted (as others have done the same) to set a FoldChage of 2 as a cutoff. 2 times more transcripts is somewhat significant at biological level for a cell. But is it really? If we assume it is, it brings me to the next point:
What is a cutoff for p-value? I'm tempted to use padj (hence FDR-corrected) and the hits I'll get are almost surely genuine (in fact, I tested those by qPCR and indeed they are differentially expressed from A vs B). However, am-I missing potentially interesting hits by being too much restrictive? Then, where do I set my cutoff?

FYI: I'm dealing with Illumina, single strand 50pb, non strand-specific, bacterial RNA-seq data.

Thank you all for your input on this,

TP

rnaseq deseq rpkm • 15k views

ADD COMMENT • link updated 10.6 years ago by seidel 11k • written 10.6 years ago by ThePresident ▴ 80

score 3 · Answer 1 · 2013-09-28

I'll just echo what dpryan70 said in a comment, where you set your cutoffs depends completely on what you plan to do with the results. If you have an assay to easily screen through lots of genes, then you can be liberal about your cutoff, whereas if follow up involves heavy investment then you would be much more stringent. You might also use different cutoffs for different purposes. For instance, a cutoff to select genes for qPCR validation may be different than a cutoff you would use for GO enrichment analysis.

In my experience, the magnitude of the numbers (fold change, p or q value) do not have any absolute meaning - i.e. an x-fold threshold that determines biological significance. Every data set is different, experimental systems are different, and I have to adjust both fold change and p-value restrictions on an experiment by experiment basis. It's often tempting to take the interpretations of false discovery rates associated with p-values literally, and easy to forget that the numbers are based on assumptions about distributions. The "true" and "false" used to describe positives and negatives are based on an ideal, and what is actually true and false are difficult to know. There's also the issue of conflating significance with importance (avoid "the cult of statistical significance"). Many people adjust p-values and have nothing "significant" left, yet there is plenty of evident biology in the data staring them in the face. So pick some values that seem reasonable based on what you'd like to do with the results, and prepare to iteratively adjust your choices based on your needs.

score 1 · Answer 2 · 2013-09-27

1

Entering edit mode

10.6 years ago

Devon Ryan 104k

I generally filter by adjusted p-value (0.10 is a common threshold for adjusted p-values) and then rank by fold-change. You'll lose real and meaningful changes regardless of what you do, so don't fixate too much on that.

ADD COMMENT • link 10.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for your answer. I agree with you, we need to cut somewhere and we'll lose meaningful data regardless of cutoff... that's why we set limits like pval < 0.05 ou padj < 0.1. It's just that I'm not statistician, so I have no clue how much it really means to set a cutoff for padj at 0.1. Is that threshold low, medium, high? And I hate to use something just because it's common practice... however I don't have enough statistical knowledge to accurately judge by myself! ;)

ADD REPLY • link 10.6 years ago by ThePresident ▴ 80

1

Entering edit mode

Well, high, medium and low are subjective terms, so you'll never get an answer to that. In general, that's probably a medium threshold for general use. The most appropriate threshold will depend on what you want to do with the results. If you're going to do something expensive and time consuming, like making a bunch of transgenic mice or designing a drug trial, then you'll want a higher threshold. Generally, people will do various validations, so that'll give you a better idea if perhaps you might benefit from changing the threshold.

ADD REPLY • link 10.6 years ago by Devon Ryan 104k