I used DESeq2 to process RNA-seq data from different sources. And I found harsh batch effect when plotted PCA (different shapes of the figures represent 3 different batches, for example, ctr and PH.7d from different batches cluster apart):
I tried to remove it using limma package as described here:
> colData sample condition batch 1 100 PH.7d 1 .......... 7 75 ctr 1 8 SRR5035380 hblast.10.5 2 .......... 25 SRR5035397 hblast.18.5 2 26 SRR8437299 ctr 3 .......... 37 SRR8437324 PH.7d 3 vsd<-vst(dds) assay(vsd)<-limma::removeBatchEffect(assay(vsd),vsd$data1) data2<-plotPCA(vsd, intgroup=c('condition','batch'),returnData=T) data2<-as.data.frame(data2) percentVar<- round(100*attr(data2,'percentVar')) plot2<-qplot(PC1,PC2,color=condition,shape=batch,data=data2)
However, there is no changes when I plot the results:
What am I doing wrong?
Also, I tried to remove batch effect using design in DESeq:
ddsB=DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~batch+condition)
I'm getting this error:
Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.
Can somebody help me to solve it? Thanks in advance!