Hi. How can I remove Batch Effect between several RNA-seq studies that have count data (cpm) available using Combat? The number of control and treatment samples varies between studies, and there is no replication in one study that has only Two samples (one control sample and one treatment sample). This study is very important and I cannot remove it. How can I combine non-repetitive study with other studies? Please help me if anyone knows the solution. Please help me. Thank you very much
I do not understand why people always assume that random (and even unreplicated) studies can meaningfully be merged. RNA-seq is strongly confounded by study as beyond the biological variation the choice of kits for RNA extraction, reverse transcription and library preparation kit have a notable influence on the inferred transcript/gene abundances = counts.
I would rather perform analysis on every study and then perform a meta-analysis, e.g. using ranks as in this paper.
The idea of ranks is that one calculates a ranked significance, e.g.
signed fold change * p-value and then compares these ranks per studies for all genes. Significantly upregulated genes get high ranks, significantly downregulated genes get low ranks, non-significant genes get intermediate ranks. If a gene is reproducibly up- or downregulated across studies then it should consistently be assigned a high or low rank and therefore get a low p-value in the meta-analysis.
Based on my understanding this has several advantages:
- You don't have to bother with batch correction and choice of parameters which can or cannot influence or alter the true biological effects.
- You can even use an underpowered / unreplicated study if this one is really so important as you say, e.g. analyzed with NOI-seq as it eventually only matters which ranks the genes have per study. One single study with n=1 of course does not give any robustness but if you combine several studies (including studies with sufficient replicates) and then still find genes with consistent high ranks (or low ranks in the case of downregulation) then the results can still be powerful and reliable. Still, robust meta-analysis tools will limit the influence of a single study, so even if that one study is flawed in terms of replication and/or data quality, you could still obtain significant results from the meta.
- You can easily extend the analysis if at some point if a new study shall be included as the rank calculation itself, e.g. with the tool that the linked paper developed, is fast and it does not change the analysis result of each individual study, therefore the analysis effort is limited.
The above points are based on my (limited) experience with meta-analysis, so feel free to comment if you (dis)agree.