DESeq2 analysis result differences
0
0
Entering edit mode
12 months ago
pkallurkar • 0

Hello,

I performed patch-seq for 2 sets of neurons and then used DESeq2 to look for transcriptomic differences between the groups. One group consists of 7 neurons and the second group consists of 9 neurons.

Genes that meet a threshold criteria of L2FC of more than 1.5 and adjusted p-value less than 0.01, are considered as differentially expressed (DE) genes. I get a total of 123 DE genes for my dataset.

I notice that for some genes which are visibly different are not picked as DE genes by DESeq2. A plausible reason for this is DESeq2 is treating the zero counts as dropouts. So, if I added a constant value to all the gene counts which gets rid of the zeroes. Now the visibly different gene is picked as a DE gene. Also, adding the constant value of 1 to the gene counts now gives me 953 DE genes.What this means is that DESeq2 got misled by the zeroes and treated some gene counts as false dropouts.

An example of such a gene that has following gene counts (gene counts for each neuron are separated by comma within the group):

**Gene 1:-**
Group-1: 400,6,0,118,644,0,4738
Group-2: 0,34,0,0,0,0,0,0,0


The DESeq2 statistics for this gene are:

base mean = 399.68; L2FC = -7.84; lfcSE = 2.56; **adjusted p-value = 0.13**


If I add a constant value (=1) to all the gene counts, now the DESeq2 statistics are:

base mean = 370.11; L2FC = -7.44; lfcSE = 1.26; **adjusted p-value = 1.35e-06**


Notice the big difference for the p-values (in bold above) between the original counts and after adding 1 to the counts. Can someone please explain why this is happening? At what step in DESeq2 the zeroes in the gene count are misleading the conclusions?

Thanks, Prajkta

patch-seq RNA-Seq DESeq2 Dropouts • 944 views
0
Entering edit mode

Did you perform filtering for rows/genes with no expression at all (=zero counts)?

0
Entering edit mode

Yes, I removed genes that are not expressed in any of the samples.

0
Entering edit mode

Are group 1 from same sample with different replicates? I noticed there is a lot of variability between counts. What is the source of sample?

0
Entering edit mode

Yes, groups 1 and 2 are from the same sample (neonatal mouse brainstem) with different replicates.

0
Entering edit mode

There is a heterogeneity between replicates of same group. This variability could be the cause of adj p-value = 0.13 , as calculated by DESeq2. Do you know why so much differences between each replicates?

0
Entering edit mode

No, we are not sure of the reason for so much variance in the counts between replicates. We think it is because we are performing patch-seq which involves single-cell RNA seq. So, the plausible sources of noise are amplification bias, RT error, and transcriptional bursting among others. But we do not know the exact source for such high variability in the replicates.

0
Entering edit mode

Ok.

Did you observe same in RT-PCR?

One possibility is Other tissue contamination in mouse brain stem cells while extraction procedure.

You need to see if such variability exist across all the genes or subset of genes.

0
Entering edit mode

Okay, thanks. I will look into that.

0
Entering edit mode

I think it is the 1's that are misleading the conclusions, zeroes are accounted for in the model. https://support.bioconductor.org/p/64014/

You are going to trust a change that is not true in 3/7 of your samples...?

0
Entering edit mode

I did use the psuedocounts feature which is mentioned on the thread here, but that does not change the result. Also, 5 out of 7 samples in group 1 are non-zero whereas only 1 sample is non-zero in group 2. So, that's why we are wondering if the DESeq2 results are accurate or not.