Question

Normalized read density

0

Entering edit mode

7.5 years ago

Kramdi ▴ 260

Hi everyone,

I have ATAC-seq data and I wish plot the distribution of insert size as show in this paper by Buenrostro et al, 2014 (figure 2a)

I have no problem getting the insert size for every read pair (distance between the start mapping location of R1 and the end of the mapping location of R2) and plotting the frequency of occurence of each insert size, which I thought would do, but I can't seem to reproduce the periodicity shown.

In the paper, "normalized read density" is plotted. Should I also normalize the occurences? I am wondering what could this normalization be? and why do we need to normalize in this case?

PS. My question has been partially asked in this thread. I created a new thread to focus on understanding the normalized read density.

Thanks for the help!

ATAC-seq insert size • 4.8k views

ADD COMMENT • link updated 7.5 years ago by igor 13k • written 7.5 years ago by Kramdi ▴ 260

1

Entering edit mode

The normalization in that case was simply the division of the obtained count per insert size by total readcount in the bam file (excluding chrM and everything unwanted). The author mentioned that a while ago in the ATAC-seq community.

ADD REPLY • link 7.4 years ago by ATpoint 81k

0

Entering edit mode

You are right, I came across the description on other papers also. Thanks!

ADD REPLY • link 7.4 years ago by Kramdi ▴ 260

score 0 · Answer 1 · 2016-11-02

0

Entering edit mode

7.5 years ago

igor 13k

At the most basic level, the reason for normalization is different samples will have different number of reads. Thus, a sample with 10 reads will have half the fragments of a sample with 20 reads, but it does not mean it worked half as well.

That paper has a more advanced normalization strategy as it pertains to Figure 2b:

First, the distribution of paired-end sequencing fragment sizes overlapping each chromatin state (http://www.ensembl.org/info/docs/funcgen/regulatory_segmentation.html) were computed. The distributions were then normalized to the percent maximal within each state and enrichment was computed relative to the genome-wide set of fragment sizes.

ADD COMMENT • link 7.5 years ago by igor 13k

0

Entering edit mode

Hi Igor,

I understand the rational behind normalization when working with measures coming from different samples. However, in figure 2a (the one I wish to reproduce), it seems that the distribution of insert size concerns pairs coming from a unique sample:

The insert size distribution of sequenced fragments from human chromatin had clear periodicity of approximately 200 bp, suggesting many fragments are protected by integer multiples of nucleosomes (Fig. 2a).

In figure 2b, indeed, distributions of insert sizes overlapping different chromatin states are normalized in order to perfrom a proper enrichement analysis, which is different from what is shown in figure 2a.

ADD REPLY • link 7.5 years ago by Kramdi ▴ 260

0

Entering edit mode

I believe figure 2a is just one sample shown as an example. I guess the idea is if you normalize the reads, you could easily compare to other samples if you wanted to do that.

ADD REPLY • link 7.5 years ago by igor 13k