Subset of ATAC-seq peaks has high variance
Entering edit mode
4 weeks ago
Tim • 0

Hi everyone,

I am running differential accessibility analysis for ATAC-seq data of a non-model plant. After modelling counts with voomLmFit my Mean-variance voom plot looks odd:

Mean variance trend

A subset of peaks has high but more or less stable variance regardless of average expression. It is similar to a RNA-seq plot from this post, but here the variability is not rising with average expression.

I tried to look into other properties of these outlier peaks and what I found so far is:

  1. There are no very low CPMs in any of the samples.
  2. They contain higher GC-content on average, with some of them having especially high GC-content.
  3. On average they have more significant peak-calling q-values.
  4. They contain a bigger proportion of distal intergenic peaks than rest of peaks.
  5. If I plot a random subset of these outliers and look at the cpms across samples, they all seem to have a stable ratio of CPMs across samples: cpms of outlier peaks across samples

Which is especially odd and looks more like a technical than a biological reason. I checked if the CPMs correlate with mapping stats (mapped, paired, duplicated...), but I couldn't find any correlation.

Using additional quantile normalization in voomLmFit helps to lower the variance of these peaks and running eBayes with robust=TRUE helps the model to fit better, ignoring the outliers. But outlier peaks still come out as very significant DARs with very high negative logFC.

Another possible explanation would be that these peaks lie in so-called black-listed regions. I removed all peaks intersecting repeats (which is how close I could get to a blacklist in a non-model plant), which removed almost 60% of peaks, but the outliers stayed in the set.

Did anyone came across something similar?

Do you have ideas on what these peaks could be? Should I try to remove them and how can I do this?

Any ideas would be much appreciated!

Pipeline overview: mapping bwa-mem > removal of plastid & mitochondiral mappings and PCR duplicates > peak calling with Genrich (replicates mode) > merging of peaks > peak counting > differential accessibility analysis (filtering with filterByExp > TMM normalization > voomLmFIt > eBayes)

atac-seq edgeR R normalization • 150 views

Login before adding your answer.

Traffic: 1278 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6