Question: how do I properly use edgeR::scaleOffset to combine two targeted RNA-seq panels?
0
gravatar for glocke01
6 months ago by
glocke01160
United States
glocke01160 wrote:

I am trying to analyze some targeted RNA-seq data, where for each sample/mouse I have data from two panels. Because there are two panels, the data for each sample effectively have two library sizes. This means I can't just rbind the two panels together and use a normal RNA seq workflow. In principle, I could analyze each panel separately (and in fact I have done this), but because I want to do gene set testing (CAMERA and ROAST), I need to combine the panels together into one DGEList object that accounts for the different library sizes. [edit: evidently I misunderstood the advice and scaleOffset is not necessary; see Smyth's answer below] I am kindly informed that scaleOffset in the edgeR package makes this possible (thanks Gordon Smyth! C: differential gene set testing combining genes from multiple panels ). However, I'm having trouble figuring out how this is done, and that is the purpose of this question.

I've had trouble finding documentation/example code for scaleOffset; the best example I've found to work from is this question at support.bioconductor.org: https://support.bioconductor.org/p/109361/ Here, Martin Weirauch has summed the output of calcNormFactors with the colSums of his count matrix. I'm not really following all of the t() calls to transpose the matrices, so maybe that's where I'm going wrong. Here's the code I'm working with:

    cntTab <- rbind(panel1, panel2)
    p1 <- rownames(panel1)
    p2 <- rownames(panel2)
    nf1 <- calcNormFactors(cntTab[p1,])
    nf2 <- calcNormFactors(cntTab[p2,])
    ls1 <- colSums(cntTab[p1,])
    ls2 <- colSums(cntTab[p2,])
    offMat <- matrix(c(rep(log(nf1)+log(ls1), length(p1)),
                       rep(log(nf2)+log(ls2), length(p2))),
                     nrow=length(p1)+length(p2))
    cntTab <- DGEList(cntTab)
    cntTab <- scaleOffset(cntTab, offMat)
    cntTab <- estimateDisp(cntTab, dzn)
    fit <- glmQLFit(cntTab, design=dzn)

I wasn't sure if this was correct, so I compared the log-fold changes obtained using this code to lfc's previously obtained from running limma+voom separately on each panel, and the correlation is a little fuzzy (r=0.87, rho=0.77). Notably, if I remove the scaleOffset line, the correlation of fold-changes is much tighter (r=0.95, rho=0.93) - i believe that this is like assuming the two panels are one and each sample gets one library size. If I keep scaleOffset but change the offset matrix so that I no longer include +log(ls1), then I improve the correlation by a negligible amount (r=0.96).

The fact that doing it in a way I know is "wrong" (forgetting there are two panels) reproduces my previous results more closely than doing it the way I thought was right (using colSums(panel1)), I suspect I've gone wrong somewhere!

Any help would be most appreciated.

edger rna-seq panel • 343 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by glocke01160
3
gravatar for Gordon Smyth
6 months ago by
Gordon Smyth990
Australia
Gordon Smyth990 wrote:

The library sizes and norm factors correspond to columns rather than to rows, so your formation of the offset matrix isn't correct.

Also, I didn't suggest that you use the scaleOffset function and it's not needed here. Just setting the offset matrix to be the log (effective) library size is already enough in itself.

Assuming that panel1 and panel2 are count matrices, you need something like:

nf1 <- calcNormFactors(panel1)
nf2 <- calcNormFactors(panel2)
ls1 <- colSums(panel1)
ls2 <- colSums(panel2)
effective.lib.size1 <- ls1 * nf1
effective.lib.size2 <- ls2 * nf2
o1 <- matrix(log(effective.lib.size1), byrow=TRUE, nrow=nrow(panel1), ncol=ncol(panel1))
o2 <- matrix(log(effective.lib.size2), byrow=TRUE, nrow=nrow(panel2), ncol=ncol(panel2))
dge <- DGEList( rbind(panel1, panel2) )
dge$offset <- rbind(o1, o2)

Then you can proceed to DE analysis.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Gordon Smyth990

thanks once again, Prof. Smyth!!

ADD REPLYlink written 6 months ago by glocke01160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1605 users visited in the last hour