Treating "patient" as a batch effect
Entering edit mode
10 days ago
Mia • 0


I'm investigating differential gene expression between tumour and matched-normal samples from the TCGA (breast cancer). Following differential gene expression using DESeq2 (design : ~ patient + sample_type), I visualised the differences between sample types (tumour / matched-normal) with a heatmap for my genes of interest.

Here is the heatmap showing expression for my genes of interest between the two groups (TP == Tumour, NT == Matched-Normal) heatmap

The expression of my genes of interest (which, according to literature, are cancer-related) appear to be very dependent on patient. The order of patients are the same in the two clusters (i.e. patient 1 is column 1 in the TP group and column 1 in the NT group), and the two groups (NT and TP) share a very similar pattern ...

I wondered whether the effect of patient was perhaps not being accounted for effectively by the design formula I used for DESeq2, so I tried removing patient as a batch effect first with Combat_seq and then using the adjusted count matrix as input for differential expression, with just sample_type in the design formula.

Now, the heatmap loses the pattern of similarity between patients, and resembles something more expected, with some gene expression differences showing up between tumour and normal heatmap2

I don't think removing the effect of patient like a batch effect is a valid approach, but I'm not entirely sure what else to do, since I've already tried to account for patient in my DESeq2 design formula, which didn't seem to entirely remove the effect. Does anyone have any suggestions?

Thank you for your time!

batchEffects ComBat_seq DESeq2 • 338 views
Entering edit mode
10 days ago

I think it is fair to treat the different patients as different batches... in reality it is possible they were all different sequencing experiments.

Entering edit mode

Thank you for your input! Just an update: I used sva_seq instead of combat_seq to remove unknown and unwanted variation, and then added the first surrogate variable to my design (i.e ~ SV1 + patient + sample_type), as per this workflow:

Entering edit mode
9 days ago

I have never heard of treating biological variation as a batch effect.

I always considered batch effect to be a systematic error that would affect a subset of samples (aka the batch) the same way.

The Wikipedia entry seems to specifically exclude biological variation from being a batch effect:

[T]he batch effect represents the systematic technical differences when samples are processed and measured in different batches and which are unrelated to any biological variation recorded during the MAGE [microarray gene expression] experiment


Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6