Question: How to continue processing Hi-C reads
16 months ago by
Hushus 20
Hushus 20 wrote:

Hello all,

I am new to Hi-C data processing and am wondering what to do from here. I am sort of familiar with Juicebox and the hifive manipulation in Galaxy but I really need advice.

So I have the following information:

chromosome_1 | strand_1 | position_1| DistanceClosestRestrictionSite_1 | Bin_1 | chromosome_2 | strand_2 | position_2 DistanceClosestRestrictionSite_2 | Bin_2

^ DistanceclosestRestrictionsite does not tell you where exactly the RE is...

This data was originally processed by the authors by:

Hi-C reads were aligned using Bowtie 0.12.7 with default parameters and “-m 1” PCR duplicate reads were removed GC content, mappability, and fragment length effects were normalized as described in Hou et al., Molecular Cell 48, 471-484 (2012).

So this data is already normalized AND binned but does not have contact frequencies. My end goal is to see if two particular pairs (2kb in length) are contacting each other and how often relative to null.

Question 1) How do I get contact frequencies

Question 2) How do I format this data for viewing with Juicebox or your personal choice of Hi-C data viewer?

I editted the file to the following to be put into juicebox:

str1 - ch1 - pos1 - frag1 (all are 0) str2 - ch2 - pos2 - frag2 (all are 1)

Am i in the right direction or because these reads are all ready normalized and binned that I dont need to do this step?

