Question

Batch effects vs biological variables

1

Entering edit mode

3.7 years ago

l.uva ▴ 20

Hi all,

I am working on the DE analysis of primary vs metastasis using a small set of paired-samples (8 primary tumors & 8 metastasis).

After Variance stabilizing transformation using DEseq2, my PCA plot shows that the samples group by patient and I cannot really differentiate the Primary to the metastasis groups. As consequence, I cannot find any differentially expressed genes between the tested condition. In DEseq2 I tried to add in my design formula (design = ~ pat + cond) the patients and the condition but it does not change anything.

I decided to test the new batch-effect adjustment tool (ComBat-Seq) in my counting matrix, adding patients as batch and specifying the condition as biological covariates. It improves my PCA plot and I can do find relevant genes associated to the metastasis when I perform the DE analysis in the adjusted data.

My question is: Is it wrong to use the patient label as batch and perform such adjustment to my matrix? Is there any other approach that I could try to alleviate the effect of patients in my analysis?

#DE
dds <- DESeqDataSetFromMatrix(countData = matrix_prim_vs_pm,
                          colData = cond,
                          design = ~ pat + cond)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds$cond <- factor(dds$cond, levels = c("prim","pm"))
vst <- vst(dds)

#PCA plot 
plotPCA(vst , intgroup=c("cond")) + geom_text(aes(label=name),vjust=2)

#batch adjustment 
adjusted_counts <- ComBat_seq(matrix_prim_vs_pm, batch=pat, group=cond)

PCA witohut Batch correction

image: PCA without Batch correction

PCA after Combatseq

image: PCA after Combatseq

Thanks in advance

Batch-effect DESeq2 combat_seq • 2.4k views

ADD COMMENT • link updated 16 days ago by Ram 43k • written 3.7 years ago by l.uva ▴ 20

3

Entering edit mode

I'd be wary when doing batch correction while informing the batch correction tool about your biological conditions. Try permuting your condition (metastasis/control) labels and see how the clustering looks following ComBat.

I personally only do ComBat-type stuff when I don't deal with biological conditions (e.g. if I want to see which genes are correlated among 50 tumor samples which were sequenced in different batches) -- not for any primary DE analysis.

Some important thoughts (from others) here: A: Are we tricking ourselves with batch effect correction?

The author of that linked post writes: "Our primary advice for an investigator facing an unbalanced data set with batch effects, would be to account for batch in the statistical analysis. If this is not possible, batch adjustment using outcome as a covariate should only be performed with great caution."

I tend to agree.

ADD REPLY • link 3.7 years ago by dsull ★ 5.8k

0

Entering edit mode

Thanks for the link. Interesting reading! I will permute the conditions and see whats going to happen.

ADD REPLY • link 3.7 years ago by l.uva ▴ 20

0

Entering edit mode

It seems that when I permute my condition it affects a bit the way samples are clustering. I noticed that I can obtain better results when I do not add covariate (biological condition) and mention only the batch group in the Combat_seq...
But still very inconclusive. I will perform GSEA in the DE genes to check if somehow it does correlate with metastasis.

ADD REPLY • link 3.7 years ago by l.uva ▴ 20

0

Entering edit mode

Can you please add all code and plots to the post? It is difficult to argue only on words. In principle the strategy of adding pat as a blocking factor as you did into the DESeq2 design and treating it as batch with Combat-Seq should give similar results from what I understand as both tries to eliminate the base line difference that the different patients introduce, focusing on the tumor vs metastasis comparison.

ADD REPLY • link 3.7 years ago by ATpoint 82k

0

Entering edit mode

True! code and PCAplot were added to the post; PM group corresponds to metastasis.

ADD REPLY • link 3.7 years ago by l.uva ▴ 20

0

Entering edit mode

Thanks! How many DEGs do you get using either of the two strategies? In fact I do not really see an "improvement" in the second PCA, you still have notable dispersion.

ADD REPLY • link 3.7 years ago by ATpoint 82k

0

Entering edit mode

Yes, agree with the modest improvement. Without Combat_seq 0 and after Batch correction 93 genes.

enter image description here

ADD REPLY • link 3.7 years ago by l.uva ▴ 20