How should I deduce the variance and expectation of the logarithm of a variable in the `voom` paper?
1
0
Entering edit mode
14 months ago
Dan ▴ 180

I read this paper "voom: precision weights unlock linear model analysis tools for RNA-seq read counts", in the methods, the "Delta rule for log-cpm" section:

The RNA-seq data consist of a matrix of read counts $$r_{gi}$$, for RNA samples i=1 to n, and genes g=1 to G. Write $R_i$ for the total number of mapped reads for sample i: $$R_i=\sum_{g=1}^{G}r_{gi}$$ They define the log-counts per million (log-cpm) value for each count as: $$y_{gi}=\log_2\left(\frac{r_{gi}+0.5}{R_i+1}\times 10^6\right)$$

Write $\lambda=E(r)$ for the expected value of a read count given the experimental conditions, and suppose that: $$var(r)=\lambda+\phi\lambda^2$$ If $r$ is large, then the log-cpm value of the observation is: $$y\approx\log_2(r)-\log_2(R)+6\log_2(10)$$ where $R$ is the library size. The analysis is conditional on $R$, so $R$ is treated as a constant. It follows that $$var(y)\approx var(\log_2 r)$$. If λ also is large, then: $$(\log_2 r)(\ln 2)\approx \ln r \approx \ln\lambda+\frac{r-\lambda}{\lambda}$$ so $$var(y)(\ln 2)^2\approx\frac{var(r)}{\lambda^2}=\frac{1}{\lambda}+\phi$$ How should I deduce the last 2 equations?

Biostatistics voom limma • 1.1k views
ADD COMMENT
1
Entering edit mode

I added limma and voom tags. The (senior) author of limma and voom (Gordon Smyth) is active here, probably you will get his response.

ADD REPLY
0
Entering edit mode

Thanks. Can you please let me know why some mathematic expressions can't be rendered correctly, e.g. $r_{gi}$?

ADD REPLY
1
Entering edit mode

I quick glance suggests that the engine here is rendering display objects, but not inline math.

ADD REPLY
1
Entering edit mode

Looks like you need to add an additional $ before/after expressions to render?You will need to fix the math above. My apologies. In testing this I have messed your equation formatting.

Edit: This is one case where it may be appropriate to post the content as images properly rendered outside biostars.

ADD REPLY
1
Entering edit mode

The double $ will lead to it being rendered as mathJAX, but as a display object, rather than inline. You might need to decide if you'd rather not have it on a separate line, and have it not math.

ADD REPLY
1
Entering edit mode

Or one could use the symbol keyboard (or copy paste from online pages) for simple things like λ, ≈ and use HTML for Ri (R<sub>i</sub>).

ADD REPLY
3
Entering edit mode
14 months ago
Gordon Smyth ★ 7.0k

The delta method (aka "delta rule") is a standard mathematical method for approximating the mean and variance of a non-linear function of a random variable using a first-order Taylor series expansion, see for example:

In the paper, we used a first-order Taylor series for log(r) about lambda. The paper explained that Taylor's theorem was being used and referred readers to the expository paper by Oehlert (1992) where the method is fully explained. It is a little odd that you have copied everything else from the paper into your question above, but not the note and reference that explained the two lines that you are asking about.

Beware that the formula in the published paper is not quite right because it expands log2(r) as if it was ln(r). The corrected formula is given here: http://www.statsci.org/smyth/pubs/VoomPreprint.pdf

Reference

Oehlert GW (1992). A note on the delta method. The American Statistician 1992, 46:27–29.

ADD COMMENT

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6