Question: Batch effect correction - miRNA sequencing - DESeq2
gravatar for prekrish
5 weeks ago by
prekrish0 wrote:


First of all, I apologize for a long post . I am analyzing miRNA sequencing data generated from human samples. I have 10 paired samples - control and treated. These samples were sequenced in two batches - 4 pairs in one batch and 6 pairs in another batch. The objective is to identify differentially expressed miRNAs between control and treated groups. The experimental design is as follows:

Sample_ID Condition Patient Batch IonXpress_RNA_001 Healthy A 1 IonXpress_RNA_002 Treated A 1 IonXpress_RNA_003 Healthy B 1 IonXpress_RNA_004 Treated B 1 IonXpress_RNA_005 Healthy C 1 IonXpress_RNA_006 Treated C 1 IonXpress_RNA_007 Healthy D 1 IonXpress_RNA_008 Treated D 1 IonXpress_RNA_011 Healthy E 2 IonXpress_RNA_012 Treated E 2 IonXpress_RNA_017 Healthy F 2 IonXpress_RNA_018 Treated F 2 IonXpress_RNA_021 Healthy G 2 IonXpress_RNA_022 Treated G 2 IonXpress_RNA_023 Healthy H 2 IonXpress_RNA_024 Treated H 2 IonXpress_RNA_025 Healthy I 2 IonXpress_RNA_026 Treated I 2 IonXpress_RNA_027 Healthy J 2 IonXpress_RNA_028 Treated J 2

I tried to analyze the data using DESeq2. I checked to see if the samples cluster based on batch and yes, the PCA plot clearly showed two distinct clusters - batch 1 and batch 2. I used variance stabilized counts for this purpose. Then, I removed batch effects using assay(vsd)<-limma::removeBatchEffect(assay(vsd), vsd$Batch) . PCA plot now showed as a single cluster and not as two distinct clusters.

To identify differentially expressed miRNAs, I used <-DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~Patient + Condition). Since this is a paired sample analysis, I used the variable 'patient' in my design. I did get a list of differentially exprssed miRNAs. However, when I tried to include "batch" in the design, I am getting the following error message -

invalid class “DESeqDataSet” object: the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others

I am trying to figure out the reason for this error - is this because the conditions (healthy and treated) are overlapping between both the batches? Is there a way that I can rectify this ? Should I change my design ? If I do not correct for batch effects or remove the batch effects, will it drastically change my output? I understand that batch effects can lead to spurious results. However, I am asking this because ~ 75% of the differentially expressed miRNAs obtained from my analysis have got validated in RT-PCR. I understand that the input data has to be raw counts for DESeq2 but is there a way that I can do a differential expression analysis for batch corrected dataset and compare the results with the uncorrected dataset? This is for learning purpose.

Thank you very much Best regards, Preethi

ADD COMMENTlink modified 5 weeks ago by swbarnes24.6k • written 5 weeks ago by prekrish0
gravatar for swbarnes2
5 weeks ago by
United States
swbarnes24.6k wrote:

If each patient is in only one batch, you cannot correct for both patient and batch. You cannot include both in your design. You might as well look at your samples with PCA, see how glaring the batch effect is.

ADD COMMENTlink written 5 weeks ago by swbarnes24.6k

Thank you very much for your reply. I did do a PCA and the difference looked glaring. Both the batches separated as two clusters. I am not sure how to share the image here. But yes, they separated as two clusters and when I tried PCA plot using batch effects removed variance stabilized counts, the batches merged as one cluster.

ADD REPLYlink written 5 weeks ago by prekrish0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1144 users visited in the last hour