Question: ENCODE ATAC-seq pipeline peak calling
1
gravatar for igor
4.2 years ago by
igor11k
United States
igor11k wrote:

I am looking at the ENCODE ATAC-seq pipeline: https://www.encodeproject.org/pipelines/ENCPL035XIO/

They have two different steps:

  • "call nuclease accessible regions using FSeq" (in PDF) or "open chromatin region identification" (on diagram)
  • "call nuclease accessible peaks using Homer" (in PDF) or "peak calling" (on diagram)

Regardless of the tool used, what is the difference between "regions" and "peaks"? I would think those are the same thing (in this context, a set of loci where the reads accumulate).

atac-seq • 6.3k views
ADD COMMENTlink modified 3.0 years ago by Simply Bioinformatics170 • written 4.2 years ago by igor11k

What I understood is FSeq is to generate the signal file ( for ucsc browsers) and HOMER is for peak calling ( e.g for differential peak analysis ).

ADD REPLYlink written 4.2 years ago by geek_y11k

By signal file, do you mean a wiggle file? If it's just that, how is it different than a generic bigWig from a BAM file?

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by igor11k
1

Its not just a normalised counts at each base.

From F-Seq website:

To intuitively summarize and display individual sequence data as an accurate and interpretable signal, we developed F-Seq, a software package that generates a continuous tag sequence density estimation allowing identification of biologically meaningful sites whose output can be displayed directly in the UCSC Genome Browser

As I said before, its "What I understand"

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by geek_y11k

Do you know what this output actually looks like?

ADD REPLYlink written 4.2 years ago by igor11k
6
gravatar for igor
4.1 years ago by
igor11k
United States
igor11k wrote:

I received a very helpful clarification after emailing ENCODE directly:

Nuclease accessible regions tend to be long, e.g. 10 kb or longer. This was clear even in the early papers on DNase sensitivity (mid-to-late 1970's; Groudine and Weintraub). These accessible regions can contain entire genes or even clusters of genes. Within the nuclease accessible regions, some localized DNA segments are so readily cleaved that double-strand breaks are generated at that position in a substantial fraction of the cells in the population. These are the DNase-hypersensitive sites (DHSs) first mapped by Carl Wu (late 1970's). I see the Fseq "regions" as the equivalent of nuclease accessible regions, and the Homer "peals" as the equivalent of DHSs.

If you look at the signal track for DNase-seq or ATAC-seq, you see broad regions of signal that are significantly above the background. Within those regions, you see localized peaks, often many peaks per region. Fseq calls the broad regions, and we use Homer to call the localized peaks. MACs can be used for peak calling as well, Anshul Kundaje is doing that. You can see similar analyses in the work from John Stamatoyannopoulos for DNase-seq. I think Hotspots are like regions, and DHSs are peaks confined to a defined length.

ADD COMMENTlink written 4.1 years ago by igor11k

Might be worth looking into the Danpos2 suite and/or iNPS for peak calling of DNase-seq data if what i'm reading here makes sense. Those peak callers are for MNase-seq data, but it seems that it may apply in this case.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Sinji3.0k

I've never worked with MNase, but shouldn't all those peaks be ~150bp (size of a single nucleosome)?

enter image description here

ADD REPLYlink written 4.1 years ago by igor11k

Yes, and now, coincidentally, we realise that this size (~154bp I believe) also corresponds generally to the mean fragment length of circulating free DNA in blood plasma. In fact, latest research indicates that we can analyse nucleosome positioning and circulating free DNA and infer tissue of origin of the cfDNA. This has utility in the identification, for example, of the tissue of origin of circulating tumour DNA fragments, and thus in the identification of which organ may be showing early signs of cancer.

ADD REPLYlink written 3.0 years ago by Kevin Blighe66k

Thanks for updating us.

ADD REPLYlink written 4.1 years ago by geek_y11k
0
gravatar for Simply Bioinformatics
3.0 years ago by
WashingtonDC
Simply Bioinformatics170 wrote:

This pipeline is currently deprecated and been replaced by this one:

https://github.com/kundajelab/atac_dnase_pipelines

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Simply Bioinformatics170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 811 users visited in the last hour