Question

Targeted RNAseq Analysis

1

Entering edit mode

7.6 years ago

Kelsey Tyssowski ▴ 10

I am using EdgeR to find DE genes in my data. However, my data is a bit unusual because, in order to reduce necessary sequencing depth, I have been enriching my sequencing libraries for 300 genes of interest before sequencing and then running EdgeR DE analysis on only those genes. Of the 300 genes, 270 are “test” genes that could be DE between the two experimental conditions and 30 are “control” genes that I am reasonably sure should not change. I am concerned that using EdgeR with the default settings is not appropriate for this analysis because it assumes that most of the genes are not DE. I am also concerned that using only a small number of genes is not compatible with using EdgeR. Are these valid concerns? If so, is there anyway to get around them using EdgeR or another program? Would I have to be worried about the same things if I use DESeq2? Any advice would be appreciated! Thanks!

RNA-Seq R Differential Expression EdgeR DESeq2 • 1.7k views

ADD COMMENT • link updated 7.6 years ago by Devon Ryan 104k • written 7.6 years ago by Kelsey Tyssowski ▴ 10

0

Entering edit mode

Hello tyssowski!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/87121/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Sorry about that! I posted here first and then someone told me about the bioconductor forum and recommended I post there with more info. I'll try not to do that in the future!

ADD REPLY • link 7.6 years ago by Kelsey Tyssowski ▴ 10

0

Entering edit mode

We probably (p=0.049) won't crucify you for this offense ;-) More tips and guidelines can be found here: How To Ask Good Questions On Technical And Scientific Forums and http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

score 3 · Accepted Answer · 2016-09-15

Regardless of whether you use edgeR or DESeq2 (or something else entirely), your primary difficulty will be in getting the library size normalization done correctly (I assume you hope that many/most/all of the "test" genes are DE and if these happen to have an asymmetric distribution of fold-changes then you can't use them for this step) . The general strategy is as follows:

Create a subset of the data containing only the 30 "control" genes.
Determine the normalization factor with this (calcNormFactors() in edgeR, estimateSizeFactors() in DESeq2).
Use the factors determined in #2 with the full dataset.

This ends up being very similar to using ERCC spike-ins. So if it's not clear exactly how to do step 3 then you can search this forum and/or the bioconductor support forum for "edgeR ERCC" or "DESeq2 ERCC" and likely find an example with code (for DESeq2 it's just sizeFactors(full_dataset) <- sizeFactors(control_genes), but I'd have to look up the edgeR syntax).