Hi, I’m pretty new to RNA-Seq analysis and during the single analysis steps, a couple of questions came into my mind. Maybe first some background information of my experiment:
- mice experiment with two conditions
- 6 biological replicates in both groups
- I’m searching for DE-genes and later on also pathways
- sample and library preparation results in 4 different batch effects
- since one of these effects is in linearity with my condition, I excluded this effect
- therefore I assumed only three different batch-effects
- my starting point is a count-matrix based on exon-counts
I have a question regarding my assumed batch-effects. When I do a PCA and plot the results grouped after the different batch effects, I see no real clustering for these effects. With the raw data I also see no clustering regarding my condition. Interestingly, the DE-genes analysis differs substantially if I correct for batch-effects or not.
Additionally I also did a batch-correction with limma and the ComBat-function from sva, starting with the raw data. The goal was to check which batch-effect or which combination of these effects leads to the best result, regarding the discrimination between the two conditions. I found that, as for the DE-genes analysis, correcting for all three batches is the best method.
As mentioned above I see no batch-effect-clustering in my raw data, so I want to ask you, whether it is valid to correct for batch-effects if I cannot see them in a PCA plot. Maybe I destroy my data while correcting for these effects? Is it valid to create some real artificial batch-effects and then check the analysis with these effects? Are there other methods to check whether my batch-effects are real?
Furthermore, I want to try the svaseq-function from sva but I did it not yet.
Please let me know, whether you need more information.
Thanks for your help in advance.