Regarding the batch effect in GEO datasets, I attempted to explore the batches using the sva package in the absence of known factors contributing to the batch effect.
However, after removing the batch effect using limma::removeBatchEffect
, the results were significantly different from expected. My objective is to quickly obtain the exprsData
after removing the batch effect, and perform further analysis beyond just differential expression analysis.
As my professional knowledge is limited, I would like to seek advice from the forum teachers on the following:
- What method should I use to quickly obtain the
exprsData
after removing the batch effect, while taking into consideration pairing and grouping factors during the removal process? - Can I use the RUVg package to explore the batch effect in chip data?
Thank you in advance for your help.
As a beginner, I hope to receive professional answers. If necessary, I can provide rewards and R code.
The righthand plot might be totally fine once you remove that one outlier on the left, try that first. The batch effect manifesting along PC1 is quite obvious, simply defining the points on the left (below x=-0.025 for example) as batch 1 and the rest as batch 2 might already do the trick. What is the data and what do you want to analyze?
It is a great honor to receive your response.
The data in question is sourced from GSE53625. While using arrayQualityMetrics to assess the quality of the data, I noticed batch effects. However, I was unable to identify the known batch effect factors for this dataset.
I would like to remove the batch effects from the data to be used in survival analysis, WGCNA, immune infiltration analysis, and other applications. Therefore, I hope to eliminate the batch effects and bring the data closer to its true distribution.
R code: