Targeted RNAseq Analysis
1
1
Entering edit mode
7.6 years ago

I am using EdgeR to find DE genes in my data. However, my data is a bit unusual because, in order to reduce necessary sequencing depth, I have been enriching my sequencing libraries for 300 genes of interest before sequencing and then running EdgeR DE analysis on only those genes. Of the 300 genes, 270 are “test” genes that could be DE between the two experimental conditions and 30 are “control” genes that I am reasonably sure should not change. I am concerned that using EdgeR with the default settings is not appropriate for this analysis because it assumes that most of the genes are not DE. I am also concerned that using only a small number of genes is not compatible with using EdgeR. Are these valid concerns? If so, is there anyway to get around them using EdgeR or another program? Would I have to be worried about the same things if I use DESeq2? Any advice would be appreciated! Thanks!

RNA-Seq R Differential Expression EdgeR DESeq2 • 1.7k views
ADD COMMENT
0
Entering edit mode

Hello tyssowski!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/87121/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Sorry about that! I posted here first and then someone told me about the bioconductor forum and recommended I post there with more info. I'll try not to do that in the future!

ADD REPLY
0
Entering edit mode

We probably (p=0.049) won't crucify you for this offense ;-) More tips and guidelines can be found here: How To Ask Good Questions On Technical And Scientific Forums and http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

ADD REPLY
3
Entering edit mode
7.6 years ago

Regardless of whether you use edgeR or DESeq2 (or something else entirely), your primary difficulty will be in getting the library size normalization done correctly (I assume you hope that many/most/all of the "test" genes are DE and if these happen to have an asymmetric distribution of fold-changes then you can't use them for this step) . The general strategy is as follows:

  1. Create a subset of the data containing only the 30 "control" genes.
  2. Determine the normalization factor with this (calcNormFactors() in edgeR, estimateSizeFactors() in DESeq2).
  3. Use the factors determined in #2 with the full dataset.

This ends up being very similar to using ERCC spike-ins. So if it's not clear exactly how to do step 3 then you can search this forum and/or the bioconductor support forum for "edgeR ERCC" or "DESeq2 ERCC" and likely find an example with code (for DESeq2 it's just sizeFactors(full_dataset) <- sizeFactors(control_genes), but I'd have to look up the edgeR syntax).

ADD COMMENT

Login before adding your answer.

Traffic: 2966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6