DeSeq2 analysis - very low variability among replicates
0
0
Entering edit mode
6 months ago
srhic ▴ 40

Hello,

I am doing some standard differential analysis on dnase-seq data between two conditions using deseq2. My results show a huge number of differential regions with around 80% peaks having adj pvalue <0.01. Since I was not expecting such a huge difference between conditions, I think there might be something wrong with my data.

After searching for similar issues, I have come across posts which mention that this could results from replicates being very similar and not capturing enough biological variability. Looking at IGV tracks for my replicates for each condition (3 reps per condition) shows this to be true for my data. My replicates for each condition look extremely similar and almost look like technical replicates (the person who did the experiment is not around, but I am getting a bit suspicious if they really are biological replicates). When I visually compare the differential peaks in IGV, they are usually areas with very small peaks/low reads and show small differences between conditions which probably wouldn’t have been called significant if the replicates were better. Most of my differential analysis results seem to be just noise.

For now I am just ignoring the adj p value cutoff (as pretty much everything has extremely low pvalues) and using a very high logfc cutoff which gives better results. However, I was wondering if there is anything else I can do with the DeSeq2 or edgeR pipeline to account for this lack of variability in replicates so noisy regions are not called significant?

Thanks

edgeR deseq2 rnaseq • 600 views
0
Entering edit mode

Can you show an MA-plot (plotMA function of DESeq2) and say what the samples are? Cell lines? If cell lines, and maybe taken from the same dish for a "triplicate" then what you describe can happen.

0
Entering edit mode

Thanks, I have added the imgbb link for the ma plot to the post (I couldn't get it to embed properly for some reason).

The data is from cell line (mouse embryonic fibroblasts). One wild type and one knockout for a target protein. It is very much possible that the triplicates may have been from the same plate.

1
Entering edit mode

Yes, this looks like I was suspecting, lots of changes on the chromatin level and lots of regions with very low effect sizes. You can use the lfc option during either results or lfcShrink to test specifically against a minimum effect size (the default is 0). Significant results then have good statistical evidence to have an effect size (=logFC) greater than lfc. Doing so you can remove regions that are unlikely to have a biological meaning as the effect size is so small. it is also a good strategy for filtering in order to focus on regions with large effect size. I would recommend to use lfcShrink for it. What you see is (at least in my hands) very common for cell lines.

0
Entering edit mode

Thanks, that helps a lot. I need to read a bit more about the lfcShrink function as I don't completely understand the idea behind the different types of shrinkage.

However, changing the lfc threshold in results does change my ma plot a lot and gives a much more reasonable number of differential regions. Is there anything I should keep in mind when deciding on a threshold or should I just try different thresholds and choose one that gives results I consider 'reasonable' ?

0
Entering edit mode

Generally one uses something that is not overly strict, maybe log2(1.5) given the large number of DE regions in your experiment. The DESeq2 vignette discusses lfcShrink in some detail, and there are many threads at Bioconductor (support.bioconductor.org) as well to get a background. Yes, lfc does not change the plot, just the number of DE genes, lfcShrink will change the logFCs and by this the plot using shrinkage, see manual for details.