"call nuclease accessible regions using FSeq" (in PDF) or "open chromatin region identification" (on diagram)
"call nuclease accessible peaks using Homer" (in PDF) or "peak calling" (on diagram)
Regardless of the tool used, what is the difference between "regions" and "peaks"? I would think those are the same thing (in this context, a set of loci where the reads accumulate).
To intuitively summarize and display individual sequence data as an
accurate and interpretable signal, we developed F-Seq, a software
package that generates a continuous tag sequence density estimation
allowing identification of biologically meaningful sites whose output
can be displayed directly in the UCSC Genome Browser
I received a very helpful clarification after emailing ENCODE directly:
Nuclease accessible regions tend to be long, e.g. 10 kb or longer.
This was clear even in the early papers on DNase sensitivity
(mid-to-late 1970's; Groudine and Weintraub). These accessible regions
can contain entire genes or even clusters of genes. Within the
nuclease accessible regions, some localized DNA segments are so
readily cleaved that double-strand breaks are generated at that
position in a substantial fraction of the cells in the population.
These are the DNase-hypersensitive sites (DHSs) first mapped by Carl
Wu (late 1970's). I see the Fseq "regions" as the equivalent of
nuclease accessible regions, and the Homer "peals" as the equivalent
of DHSs.
If you look at the signal track for DNase-seq or ATAC-seq, you see
broad regions of signal that are significantly above the background.
Within those regions, you see localized peaks, often many peaks per
region. Fseq calls the broad regions, and we use Homer to call the
localized peaks. MACs can be used for peak calling as well, Anshul
Kundaje is doing that. You can see similar analyses in the work from
John Stamatoyannopoulos for DNase-seq. I think Hotspots are like
regions, and DHSs are peaks confined to a defined length.
Might be worth looking into the Danpos2 suite and/or iNPS for peak calling of DNase-seq data if what i'm reading here makes sense. Those peak callers are for MNase-seq data, but it seems that it may apply in this case.
Yes, and now, coincidentally, we realise that this size (~154bp I believe) also corresponds generally to the mean fragment length of circulating free DNA in blood plasma. In fact, latest research indicates that we can analyse nucleosome positioning and circulating free DNA and infer tissue of origin of the cfDNA. This has utility in the identification, for example, of the tissue of origin of circulating tumour DNA fragments, and thus in the identification of which organ may be showing early signs of cancer.
What I understood is FSeq is to generate the signal file ( for ucsc browsers) and HOMER is for peak calling ( e.g for differential peak analysis ).
By signal file, do you mean a wiggle file? If it's just that, how is it different than a generic bigWig from a BAM file?
Its not just a normalised counts at each base.
From F-Seq website:
As I said before, its "What I understand"
Do you know what this output actually looks like?