Number of Tn5 insertions in TCGA ATAC-Seq data
1
0
Entering edit mode
20 months ago
Mike ★ 1.7k

Hi all,

I am using TCGA ATAC data (bigwig files) to visualizing some peaks for some genes. In this paper there is a normalised count matrix file for each peak and sample? In supplementary file they mentioned that “To get the number of Tn5 insertions per peak, each corrected insertion site (end of a fragment) was counted…”. https://science.sciencemag.org/content/sci/suppl/2018/10/24/362.6413.eaav1898.DC1/aav1898_Corces_SM.pdf

So my question is that what is this count matrix and Tn5 insertions? Is there any relationship between peak height and number of Tn5 insertions? Can I use this number (Tn5 insertions) to select significant peaks ?

Thanks

sequencing atac-seq ATAC-Seq TCGA • 3.0k views
2
Entering edit mode
20 months ago
ATpoint 48k

I do not know how much biological background you have so I answer a bit more extensively:

The Tn5 is the workhorse of ATAC-seq. This enzyme is added to the native chromatin and will insert an Illumina adapter to the DNA at sites that are not protected by nucleosomes while simultaneously fragmentating the DNA. DNA in open chromatin is therefore adapter-tagged and can be enriched over the background of closed chromatin using PCR followed by quantification via NGS. The accumulation of Tn5 insertion sites is therefore a measure of chromatin accessability. From the technical site, given you have the Tn5 insertion positions (=the 5' end of each read) you can use standard peak callers to identify local enrichments (=peaks). One typically extends the Tn5 sites by like 50bp in each direction to smoothen the signal and allow more precise identification of peak summits.

The count matrix is then simply created by intersecting peak locations and Tn5 sites (or reads which is basically the same).

So yes peak height is a function of Tn5 insertion frequency (which is the same as read counts).

Hope this was clear, if not feel free to ask.

0
Entering edit mode

Nice explanation thank you so much,

So number in count matrix is "number of Tn5 insertions" , If this number is high then peak height also would be high and this is probability for open chromatin structure. Am I right ? Can I use this numbers to find differential significant peaks between two class of samples?

Again thanks a lot

1
Entering edit mode

Yes one could use the count matrix but one would probably use raw counts to feed it into tools like edgeR. See if they provide raw counts.