RNA-seq batch effect removal before DEseq2
1
0
Entering edit mode
4.8 years ago

Hi I'm trying to analyze RNA-seq data of multiple ICGC projects. I want to remove batch effects of projects with SVA package and algorithm before employing DEseq2 for finding differentially expressed genes and after that network analysis. I have these questions:

1) DEseq2 uses only raw read counts, and it shouldn't be normalized in any way. How can I use SVA output as DEseq2 input? I know that DEseq2 has the ability to get confounding variable (project codes) in its formula but I don't wan't to use it.

2) Should I log transform my raw read counts before using SVA?

3) Can I use microarray and RNA-seq in a single analysis after removing its batch effects by SVA and normalization? Is it a valid analysis? If not, what is the best way to analyze the two datasets as a single dataset? about half of my samples are only in microarray. none of samples are repeated in datasets.

thanks

RNA-Seq R • 2.4k views
ADD COMMENT
5
Entering edit mode
4.8 years ago

As you know, the recommended approach is to include batch as a variable in your design formula. If you have identified other surrogate variables via SVA, then include those in your design formula. These will then be accounted for when the statistical model is fit to the data.

If you want to remove the batch effect from your normalised + transformed data for downstream analyses, then you can do this via, for example, removeBatchEffect() (from limma). It is performed on the transformed data itself, but using the information from your design formula.

Kevin

ADD COMMENT
0
Entering edit mode

thanks Kevin, it made a bit clear. My third question is a critical problem for me. can you help me on that? do you any method or pipeline like: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3617154/

ADD REPLY
0
Entering edit mode

Regarding the third question, they seem to have used SVA for the purposes of integrating the microarray and RNA-seq. I wish that they would provide the exact code that they used, though. The methods 'Multivariate analysis on combined data' is not clear.

ADD REPLY

Login before adding your answer.

Traffic: 2908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6