Question

How to use Hi-C data?

0

Entering edit mode

4.1 years ago

framtid1994 • 0

Hi

Im very new to Hi-C and Im not quite sure how to approach these data, as they look a bit stranger than Im used to from e.g. ChIA-PET data. In ChIA-PET data you get the regions that are interacting, but all the Hi-C data I have seen so far is presented like this (after raw sequencing data have been processed by e.g. HiC-pro):

HWI-D00119:283:HVWGBCXX:2:2113:7308:28159 chr1 10032 - chr13 47215551 -
HWI-D00119:284:HVWGBCXX:2:2205:16386:19423 chr1 10035 - chr7 13791889 -

As I understand this data position 10032 at chr1 interacts with 47215551 at chr13, but shouldnt it be regions instead for interaction start and interaction end instead of a specific position?

E.g. chr1 10032 10050 chr13 47215551 47215600

Is the position from the Hi-C where the biotin-labeled nucleotide is found on each restriction fragment?

Is there any way to convert these data into bed files displaying these interactions? What I very much would like to do with these data is to see whether my list of CpGs in one interacting end is forming interaction with a gene in the other end.

Thanks in advance!

next-gen genome sequencing • 1.3k views

ADD COMMENT • link updated 3.3 years ago by Paula • 0 • written 4.1 years ago by framtid1994 • 0

0

Entering edit mode

Hi everyone

I read that Framtid1994 did a ChIA-PET analysis and i have some questions about of how can i recognized the linker sequences in tha fastq file? I read the some Encode procedures, but i didn't get find this there. Someone can help me to understand that? I am trying to use Mango pipeline for some Encode ChIA-PET from CTCF and RAD21 and like all ChIA-PET analysis it needs the linker sequence. Sorry to ask this in this post, but i think that you can help me to understand this.

ADD REPLY • link 3.3 years ago by Paula • 0

score 0 · Answer 1 · 2020-04-18

Its too late to answer but may be useful to others.

They are just the position of reads (ValidPairs) of a ligated fragment. They can not be considered as a real "interactions" unless you call statistically significant interactions. Usually the genome is binned (5KB, 10KB etc ) and then the reads are overlapping each bin are counted and which undergoes a statistical testing. Hi-C is not at basepair resolution , so its always a window which depends on the approximate resolution of the data.

Hi-C pro also has hicpro2juicebox.sh for Juicebox compatibility