Question: RNAseq: What's the accurate way to test whether a set of N genes is differentially expressed?
gravatar for francois
9 months ago by
francois10 wrote:

In RNAseq data, I want to test statistically whether a set of 17 genes are differentially expressed in a condition. I have two conditions: control treated/drug treated. My quick & dirty way of looking at it was: I computed up/downregulation vs control by simply dividing the counts in drug treated by the counts in control treated. I thus have 17 fold-change for my 17 genes (eg. 0.8, 1.2, etc.). I averaged these fold-changes (average is 0.85, so these genes seem to be downregulated as a set in average) To have an idea whether this result had anything surprising, I computed the same average for 10,000 random sets of 17 genes. 90% of the random sets are over 0.85, 10% are below.

What's the formal way of doing this?

rna-seq • 326 views
ADD COMMENTlink modified 9 months ago by Devon Ryan93k • written 9 months ago by francois10

The formal way is to test all genes at the same time, there are tools like edegR, DESeq2 or limma voom which are designed for this kind of analysis. Read the manual and it will guide you thru the analysis.

ADD REPLYlink written 9 months ago by Benn7.9k

Sure, but I believe that is a different question. Why would I not be able to specifically look a set of genes and ask whether they are down/upregulated as a set?

ADD REPLYlink modified 9 months ago • written 9 months ago by francois10

Because RNA-seq data gives info on all genes. If you are interested in a panel you should use qPCR for example.

ADD REPLYlink written 9 months ago by Benn7.9k

The best strategy, in my opinion, is to do DE analysis first with e.g. edgeR or limma voom, followed by roast analysis which tests whether your 17 genes are enriched in the set of downregulated genes.

ADD REPLYlink written 9 months ago by Benn7.9k

Because genes in an RNA-seq experiment are not independent measurements. You should use one of the established tools Benn mentioned for both count normalization and differential analysis. Did you do any normalization of the counts in your approach? Using the above frameworks also allows you to claulcate significances based on the expression levels and inter/intra-sample variance which is preferred and recommended. As a review I would reject such a hand-made approach you propose as it is not standardized and not really reliable. Lowly-expressed genes tend to have high (and artificially high) fold changes.

ADD REPLYlink written 9 months ago by ATpoint28k

You need data from all the genes to normalize properly. Manipulating raw counts without normalizing is a really bad idea.

ADD REPLYlink written 9 months ago by swbarnes27.2k

The sequencing depth may vary between groups, so plain read-counts are misleading. If your control had better RNA handling, because they were processed before lunch, it may have more reads overall, and then all genes appear downregulated. Tools made for differential expression know this and account for total reads by 'normalizing' the readcounts into something else. Try DESeq2 or edgeR to get per-gene information. For evaluating if the set is wholly shifted, I like your random sets method, but be sure to account for both up and downregulation, maybe look at abs(x-1). And 10K is not a large number of 17-sets (there's >10^56 ways to choose 17 from 15K genes, so try to run a few million).

ADD REPLYlink modified 9 months ago • written 9 months ago by karl.stamm3.6k
gravatar for Devon Ryan
9 months ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

You're looking for GSEA (gene set enrichment) types of methods. Note that these may not have fold-changes, since that often has no meaning.

ADD COMMENTlink written 9 months ago by Devon Ryan93k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1105 users visited in the last hour