I have RNAseq data (read count) of 96 mouse primary tumors with 15 different genotypes. These 96 samples are sequenced in 10 different days, however most of the data with the same genotype are sequenced at the same day. I am afraid if I do batch correction for sequencing day I also loose biological differences that exist across different genotypes. Any suggestion?
This is my script : After batch correction I see a lot change in the PCA plot
dds <- DESeqDataSetFromMatrix(as.matrix(all), colData, design = ~ Batch) vsd <- vst(dds, blind = F) plotPCA(vsd, "Batch") assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$Batch) plotPCA(vsd, "Batch")
Part of colData:
Genotype condition Batch 1 A primary 2017-06-29 2 A primary 2017-06-29 3 A primary 2017-06-29 4 A primary 2017-06-29 5 A primary 2017-06-29 6 AK primary 2017-11-09 7 AK primary 2017-11-09 8 AK primary 2017-11-09 9 AP primary 2018-04-18 10 AP primary 2018-04-18 11 AP primary 2018-04-18 12 AKP primary 2019-09-12 13 AKP primary 2019-09-12 14 AKP primary 2019-09-12
I also look at these questions:
But still not sure what should I do, I really appreciate any help!