Question: ATAC-seq size selection and TF prediction on paired end data
gravatar for bwassie
2.6 years ago by
bwassie0 wrote:

Hi all

I have a few questions about ATAC-seq data analysis. My lab is using ATAC-seq to identify accessible regions in the chromatin and check for differential chromatin accessibility between disease and control state as well as checking for TF binding in open chromatin regions (we usually do motif analysis for this). We currently do not size select our data and we do paired end sequencing.

In order to do motif analysis, should we remove fragments that correspond to nucleosomal reads? Since TFs usually bind in nucleosome free regions, it doesn't make sense to me that we keep larger, nucleosomal fragments. However, I have seen many papers that do not do any sort of size selection (experimental or computational) and I am wondering if I am missing something.

Second, is it necessary to do paired end sequencing for ATAC-seq if we do size selection during library prep? I have also noticed that almost everyone does paired end sequencing for ATAC but I'm not sure why this is the case?

ADD COMMENTlink modified 2.6 years ago by Devon Ryan95k • written 2.6 years ago by bwassie0

Just because an area is not defined as a nucleosomal free region doesnt mean it isnt one. There maybe TFs binding there that make it look like a nucleosome occupied you would lose it in your motif analyses

ADD REPLYlink written 2.6 years ago by YaGalbi1.5k

That's a fair point kenneth. Do you notice that in your data?

ADD REPLYlink written 2.6 years ago by bwassie0

Well Im just going off what I remember from reading in the NucleoAtac github issues pages. Somewhere in there there is a warning that just because a region is not called as an "NFR" does not mean it is not one. It just means there wasnt the evidence required (length, flanking nucleosomes etc).

To be honest, I'm actually going to take a look into Devon's answer below in his suggestion for footprinting.

ADD REPLYlink written 2.6 years ago by YaGalbi1.5k
gravatar for Devon Ryan
2.6 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:
  1. Yes, at least we filter out everything that's nucleosomal in size (in our snakemake pipeline we use a value of 150 for this). Strictly speaking I suppose you don't need to do this, but given how footprinting works it helps shrink the search space.
  2. How exact is your size selection? While you could theoretically get away with SE sequencing, it sure makes the analysis a lot easier. Further, you're just opening yourself up to reviewer criticism if you got with SE rather than PE ("your results may just be an artifact of having not properly excluded nucleosomes" or "your results are due to a bias of having too-short fragments" ...). We also use PE reads for our ATACseq datasets.
ADD COMMENTlink written 2.6 years ago by Devon Ryan95k

Hi Devon,

We do gel based size selection; we just use a razor and cut the nucleosome free band from the gel. We've been doing this for a while and we're thinking of switching to paired end. I agree about the analysis being easier with paired end!

ADD REPLYlink written 2.6 years ago by bwassie0

I mostly asked about how you were doing size selection because one of the first steps in footprinting is to input open regions, which are basically non-nucleosomal-sized peaks. That's easy to do and exact if one filters by fragment size, but I imagine it wouldn't be terribly exact if one is just cutting out a gel block and doing a DNA extraction from it.

ADD REPLYlink written 2.6 years ago by Devon Ryan95k

Devon - I would like to perform footprinting but I read somewhere that you need massive ATAC sequencing depth to do this (over 150-200M per sample). Currently our average depth is around the 50M read mark. What would you recommend?

ADD REPLYlink written 2.6 years ago by YaGalbi1.5k

150-200M seems a bit over the top. I think we've had success with 100M using Wellington footprinting, but I'll double check with the most recent person to have done this once she gets in today.

ADD REPLYlink written 2.6 years ago by Devon Ryan95k

She just wrote that she ended up with 60M pairs after filtering, so I'd guess at least 100M, maybe more like 150M initial to be sure. Note that this is for mouse/human sized genomes, so scale that appropriately for whatever you're working with.

ADD REPLYlink written 2.6 years ago by Devon Ryan95k

Yes this is for the mouse genome. After filtering the read counts range from 30-60M. I'll certainly give Wellington a go. Thanks you for that :)

ADD REPLYlink written 2.6 years ago by YaGalbi1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1653 users visited in the last hour