How to calculate RPKM from RUVr?
0
0
Entering edit mode
12 months ago
star ▴ 350

I have an RNA-seq count table which is generated by integrating several studies. I want to calculate RPKM but first I run RUVr to remove the unwanted variables. I want to know if I use normCounts(RUVr) to calculate RPKM, would it be correct?

RUV R • 903 views
ADD COMMENT
1
Entering edit mode

RUVr should be run on raw data insofar I am aware. The "corrected" counts it produces should not be used for downstream analyses (this is for visualization purposes only). What you need to do is take the weights for the unknown sources of variation produced by RUVr and supply these alongside your design matrix to downstream tools such as DESeq2 (again, with the raw counts as the primary inputs here).

ADD REPLY
0
Entering edit mode

Thank you @Dunois, but how we can compare Genes across samples that are selected from different studies? when there is the batch effect.

ADD REPLY
0
Entering edit mode

If you are using DESeq2 for example, you should include the batch as a variable in the model. This is covered fairly extensively in the vignette:

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

ADD REPLY
1
Entering edit mode

That requires that the tested covariate(s) are present in all batches though. Meaning that if you e.g. have five studies and want to test control vs treatment, then each study needs controls and treatments. You cannot collect treatment from study 1-3 and control from study 4-5. That would be so-called "confounded" or "nested". Is that the case @star?

ADD REPLY
0
Entering edit mode

Thank you @ATpoint.

I have 3 samples from Study 1 and 5 samples from Study 2 that two of which are the sample from Study 1 (Study 1= samples A, B, C and Study 2 = samples A, B, D, E, F ). I don`t want to run Differential Analysis, but only to calculate TPM/RPKM from the count table (because I would like to check the expression of a subset of genes across those samples as a heatmap).

Since I expect e.g. sample A and B cluster together (in PCA plot), I run RUVr and when I plot PCA on normCounts(method3_RUVr), it looks nice but I am not sure whether I can use normCounts(method3_RUVr) for calculating TPM?

ADD REPLY

Login before adding your answer.

Traffic: 1447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6