Question: Plotting Read Density/ Distribution/ Composite Profiles [Chip-Seq]
5
gravatar for Sukhdeep Singh
7.9 years ago by
Netherlands
Sukhdeep Singh10k wrote:

Hi,

I was wondering what one means by the term 'read density or abundance'. Does it simply means the piling of reads and checking the height of the pile per base (something which can be obtained using mpileup or some other tools) or how to do it. How to plot this. I used to do similar graphs in terms of peaks vs distance from TSS.

Look at this graph [Source]

Source:http://www.sciencedirect.com/science/article/pii/S0092867412004096

Thanks for the description.

R ngs chip-seq • 10.0k views
ADD COMMENTlink modified 6.1 years ago by kanwarjag1.0k • written 7.9 years ago by Sukhdeep Singh10k

Hi Sukhdeep - Sorry i am hijacking your thread - but it seemed very close to a question I was going to post - I thought I'll ask you directly here. Like you, I too am not very clear with the idea of read density.

so I wanted to calculate read density for defined regions ( after extension and binning into 100bp windows) across multiple marks as done in Fig 2a of this paper

Do you think something like this would be correct to calculate the read counts -

intersectBed -a $bedfilecontaining100bpbins -b $bedfileK27ac -c

where the last column would give us the no. of times the 100bp bins appeared in the bed file for the chipseq histone mark across the genome. and then use this to create the heatmap?

I am not too sure of how to calculate the read density here

Thanks a lot!

ADD REPLYlink modified 5 months ago by RamRS27k • written 6.1 years ago by aditi.qamra260
1

You can use this, the last column will be you readDensity, it says how many reads of H3K27ac are in those 100bp bins, you can then do it for all the locus you are interested in (eg promoters of 10K genes) and then add the rows from top to bottom (col1 with col1 of all genes) divided by total no. of genes, to give you an average profile accross that locus.

For advanced options, check this post Binning Over Genes And Calculating The Coverage [Bedops/Bedmap]

ADD REPLYlink modified 5 months ago by RamRS27k • written 6.1 years ago by Sukhdeep Singh10k

Thank you !

ps. Will refrain from hijacking threads from now

ADD REPLYlink written 6.1 years ago by aditi.qamra260

I am plotting Chip seq tag density as shown by Sukhdeep above, However I don't get such smooth curves. I am using NGS plot and try to use -MW 10 option (meant for smoothing window). Is there a magic or finer point I need to pay attention.

Thanks

ADD REPLYlink modified 5 months ago by RamRS27k • written 6.1 years ago by kanwarjag1.0k

This site is a Q&A and we encourage people to not post new questions in the answer section. In fact moderators have already moved these posts into comments.

From my own observation posting questions as followups have a lower chance of being answered they all depend on those few people that stumble upon that particular thread. Most of of the audience will not notice them since these are buried deeply in other posts.

Moreover posting new questions into another question decreases the utility of the site in the long term it makes it more difficult to find the answer for a question

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Istvan Albert ♦♦ 84k

Best is to create a new question with an example plot!!

ADD REPLYlink written 6.1 years ago by Sukhdeep Singh10k
3
gravatar for Istvan Albert
7.9 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

The concept of read density is not all that well established, in this paper (check the methods section) they average the base coverage over a 40bp window.

Overall this not may always be a good approach, often the coverage is not the right measure. Here it seems to work only because the read length is so much smaller than the plotted 10Kb (!) range. Fundamentally a chip-seq experiment measures only the 5' ends of the fragments. Using the entire read does nothing else but multiplies data from a single measurement into many others (read length size) and that will hide the finer behaviors.

ADD COMMENTlink written 7.9 years ago by Istvan Albert ♦♦ 84k

I got your point, some people say Tag density and in brackets avg. reads. So, I assume its just same as calculating the coverage per base over certain window and then average them in terms of number of genes in that window or something else. Thanks Istvan

ADD REPLYlink written 7.9 years ago by Sukhdeep Singh10k

Interesting comment. I never thought of it this way. Although, if you think about how the fragment is produced in theory, then a fragment is present in your data set because your factor was bound somewhere on the fragment. One doesn't know where. It could be the 5' end or anywhere else. If you imagine fragments being produced from a region where the factor binds in a single place, you will get diversity of fragments, but they should all overlap the binding site. So, if you add up all the tags, the binding site should look larger than everything else.

ADD REPLYlink written 7.3 years ago by KCC4.0k

Yeah, that's whole concept of chip-seq visualization, but the average density plots/ composite profiles, tell you how, the binding looks at a specific region for the all/ some genes. Like the above plots show, H3K4me3 is enriched at TSS where as H3K36me3 goes higher on the gene body and towards the gene end.

ADD REPLYlink written 7.3 years ago by Sukhdeep Singh10k

It is true that this is also affected by the sample preparation protocol. If say you have histones where the DNA is digested away then the 5' end is very close to the leftmost bound location. If full fragments are pulled down where the bound location is much smaller than the fragment then the location could be anywhere on that fragment.

That being said the main observation still stands. If you take the whole read you are just multiplying one measurement into many. This is unlike a SNP detection where each of these other bases of the read may contain more information. Here the 5' end of the read determines all information that can possibly gained from the other bases. Basically the other measurements coming from the same read are not independent of the first base.

This actually has some dramatic implications that are perhaps better left unexplored - I for one believe that none of the statistical methods customarily employed in ChipSeq analysis is entirely correct because of this interdependence.

ADD REPLYlink written 7.3 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1185 users visited in the last hour