Question: Deseq2 positive results for genes highly variable between replicates
0
gravatar for guillaume.rbt
8 months ago by
guillaume.rbt770
France
guillaume.rbt770 wrote:

Hi all,

I'm using DESeq2 to find differentially expressed genes between two conditions from RNAseq data, with lots of replicates (46 in condition "1", 20 in condition "2").

I get results with significative adjusted p-values, but for most of them the gene expression values are highly variable between replicates.

For example for the gene with the lowest adjusted p-value, I've got all samples from both conditions with low normalized counts (around 10), and just one sample in one condition with >200000 normalized counts, which drives the differential expression toward this condition.

See log2(normalized counts + 1) boxplot below ( the adjusted p-value is 8.05e-12, and the log2FC is -5.87 between condition "1" and "2" for this gene)

boxplot

Here is the code I used :

dds <- DESeqDataSetFromTximport(tx_import_data, coldata, ~condition)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds$condition <- relevel(dds$condition, ref = "R")
dds <- DESeq(dds)
res05 <- results(dds, alpha=0.05)

I'm wondering if this is "normal" that DESeq2 keeps those kinds of results and I that should filter it if I find it irrelevant, of if I made some mistake during the process and that DEseq2 should only keep genes without such expression dispersion between replicates?

Thank for your help

deseq2 rnaseq • 351 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by guillaume.rbt770
1

With only words but no plots illustrating your question it is difficult to make any statements. Please provide e.g. some boxplots of normalized counts or tables.

ADD REPLYlink written 8 months ago by ATpoint29k

Ok I've just put a link with a boxplot illustrating my example.

ADD REPLYlink written 8 months ago by guillaume.rbt770

log2 scale please ;-) and see How to add images to a Biostars post. You have to paste the link with the full suffix like https...foo.png to the image box.

ADD REPLYlink written 8 months ago by ATpoint29k

done ;) sorry I never uploaded a plot before

ADD REPLYlink written 8 months ago by guillaume.rbt770

I would check if these outliers samples also show outlier-like behaviour in a PCA maybe indicating a batch effect and if so, think about removing them.

ADD REPLYlink modified 8 months ago • written 8 months ago by ATpoint29k

Ok thanks, I've checked that and unfortunately they don't seem to be different from the other ones on the PCA.

ADD REPLYlink written 8 months ago by guillaume.rbt770
1

In my experience, this kind of result typically stems from the presence of a very high variability in samples of the same group (compared to between groups). You may want to correct for possible co-variates in your data (see svaseq) or simply filter out results with high dispersion.

ADD REPLYlink written 8 months ago by Martombo2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour