Question

Plotting Read Density/ Distribution/ Composite Profiles [Chip-Seq]

5

Entering edit mode

13.0 years ago

Sukhi Singh 11k

Hi,

I was wondering what one means by the term 'read density or abundance'. Does it simply means the piling of reads and checking the height of the pile per base (something which can be obtained using mpileup or some other tools) or how to do it. How to plot this. I used to do similar graphs in terms of peaks vs distance from TSS.

Look at this graph [Source]

Source:http://www.sciencedirect.com/science/article/pii/S0092867412004096

Thanks for the description.

chip-seq r ngs • 15k views

ADD COMMENT • link updated 11.2 years ago by kanwarjag ★ 1.2k • written 13.0 years ago by Sukhi Singh 11k

0

Entering edit mode

Hi Sukhdeep - Sorry i am hijacking your thread - but it seemed very close to a question I was going to post - I thought I'll ask you directly here. Like you, I too am not very clear with the idea of read density.

so I wanted to calculate read density for defined regions ( after extension and binning into 100bp windows) across multiple marks as done in Fig 2a of this paper

Do you think something like this would be correct to calculate the read counts -

intersectBed -a $bedfilecontaining100bpbins -b $bedfileK27ac -c

where the last column would give us the no. of times the 100bp bins appeared in the bed file for the chipseq histone mark across the genome. and then use this to create the heatmap?

I am not too sure of how to calculate the read density here

Thanks a lot!

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 11.2 years ago by aditi.qamra ▴ 270

1

Entering edit mode

You can use this, the last column will be you readDensity, it says how many reads of H3K27ac are in those 100bp bins, you can then do it for all the locus you are interested in (eg promoters of 10K genes) and then add the rows from top to bottom (col1 with col1 of all genes) divided by total no. of genes, to give you an average profile accross that locus.

For advanced options, check this post Binning Over Genes And Calculating The Coverage [Bedops/Bedmap]

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 11.2 years ago by Sukhi Singh 11k

0

Entering edit mode

Thank you !

ps. Will refrain from hijacking threads from now

ADD REPLY • link 11.2 years ago by aditi.qamra ▴ 270

0

Entering edit mode

I am plotting Chip seq tag density as shown by Sukhdeep above, However I don't get such smooth curves. I am using NGS plot and try to use -MW 10 option (meant for smoothing window). Is there a magic or finer point I need to pay attention.

Thanks

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 11.2 years ago by kanwarjag ★ 1.2k

0

Entering edit mode

This site is a Q&A and we encourage people to not post new questions in the answer section. In fact moderators have already moved these posts into comments.

From my own observation posting questions as followups have a lower chance of being answered they all depend on those few people that stumble upon that particular thread. Most of of the audience will not notice them since these are buried deeply in other posts.

Moreover posting new questions into another question decreases the utility of the site in the long term it makes it more difficult to find the answer for a question

ADD REPLY • link 11.2 years ago by Istvan Albert 102k

0

Entering edit mode

Best is to create a new question with an example plot!!

ADD REPLY • link 11.2 years ago by Sukhi Singh 11k

score 3 · Answer 1 · 2012-06-22

3

Entering edit mode

13.0 years ago

Istvan Albert 102k

The concept of read density is not all that well established, in this paper (check the methods section) they average the base coverage over a 40bp window.

Overall this not may always be a good approach, often the coverage is not the right measure. Here it seems to work only because the read length is so much smaller than the plotted 10Kb (!) range. Fundamentally a chip-seq experiment measures only the 5' ends of the fragments. Using the entire read does nothing else but multiplies data from a single measurement into many others (read length size) and that will hide the finer behaviors.

ADD COMMENT • link 13.0 years ago by Istvan Albert 102k

0

Entering edit mode

I got your point, some people say Tag density and in brackets avg. reads. So, I assume its just same as calculating the coverage per base over certain window and then average them in terms of number of genes in that window or something else. Thanks Istvan

ADD REPLY • link 13.0 years ago by Sukhi Singh 11k

0

Entering edit mode

Interesting comment. I never thought of it this way. Although, if you think about how the fragment is produced in theory, then a fragment is present in your data set because your factor was bound somewhere on the fragment. One doesn't know where. It could be the 5' end or anywhere else. If you imagine fragments being produced from a region where the factor binds in a single place, you will get diversity of fragments, but they should all overlap the binding site. So, if you add up all the tags, the binding site should look larger than everything else.

ADD REPLY • link 12.4 years ago by KCC ★ 4.1k

0

Entering edit mode

Yeah, that's whole concept of chip-seq visualization, but the average density plots/ composite profiles, tell you how, the binding looks at a specific region for the all/ some genes. Like the above plots show, H3K4me3 is enriched at TSS where as H3K36me3 goes higher on the gene body and towards the gene end.

ADD REPLY • link 12.4 years ago by Sukhi Singh 11k

0

Entering edit mode

It is true that this is also affected by the sample preparation protocol. If say you have histones where the DNA is digested away then the 5' end is very close to the leftmost bound location. If full fragments are pulled down where the bound location is much smaller than the fragment then the location could be anywhere on that fragment.

That being said the main observation still stands. If you take the whole read you are just multiplying one measurement into many. This is unlike a SNP detection where each of these other bases of the read may contain more information. Here the 5' end of the read determines all information that can possibly gained from the other bases. Basically the other measurements coming from the same read are not independent of the first base.

This actually has some dramatic implications that are perhaps better left unexplored - I for one believe that none of the statistical methods customarily employed in ChipSeq analysis is entirely correct because of this interdependence.

ADD REPLY • link 12.4 years ago by Istvan Albert 102k