Question: Hi-C: getting intervals from "A Compendium of Chromatin Contact Maps Reveal Spatially Active Regions in the Human Genome"
2
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

Hi all,

I'm a novice in Hi-C analysis.

I'm looking at the following paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5478386/

* A Compendium of Chromatin Contact Maps Reveal Spatially Active Regions in the Human Genome *

Here, we report the most comprehensive survey to date of chromatin organization in human tissues. Through integrative analysis of chromatin contact maps in 21 primary human tissues and cell types, we found topologically associating domains highly conserved in different tissues. We also discover genomic regions that exhibit unusually high levels of local chromatin interactions.(...)

The associated GEO data is https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87112

I downloaded the archive (32Go) GSE87112_file.tar.gz . The content of this file is here.

In my dreams, I expected to find a table like

interval1   interval2 score

but I found some obscure data (to me)

$ tar xOvf all_data_contact_maps.tgz contact_maps/HiCNorm/primary_cohort/IMR90.nor.chr1.mat | fold -w 60 | head
contact_maps/HiCNorm/primary_cohort/IMR90.nor.chr1.mat
0.000000    0.000000    0.000000    0.000000
    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00
0000    0.000000    0.000000    0.000000    0.00

$ tar xOvf primary_cohort_TAD_boundaries.tgz primary_cohort_TAD_boundaries/AD.IS.All_boundaries.bed | more
primary_cohort_TAD_boundaries/AD.IS.All_boundaries.bed
chr10   4880000 4920000
chr10   6000000 6040000
chr10   7760000 7800000
chr10   9360000 9400000
chr10   12000000    12040000
chr10   13320000    13360000
chr10   14520000    14560000
chr10   15400000    15440000
chr10   17680000    17720000
chr10   18520000    18560000
chr10   19520000    19560000
chr10   21240000    21280000
chr10   22200000    22240000
chr10   23440000    23480000
chr10   24160000    24200000


$ tar xOfz all_data_FIRE_calls.tgz all_data_FIRE_calls/PO.FIRE.bed | head
chrchr  start   end
chr1    5280000 5320000
chr1    8240000 8280000
chr1    8280000 8320000
chr1    8400000 8440000
chr1    8440000 8480000
chr1    8480000 8520000
chr1    8520000 8560000
chr1    8560000 8600000
chr1    8600000 8640000

is it possible to find the significant interacting intervals (per tissue) in this dataset ?

ADD COMMENTlink written 3 months ago by Pierre Lindenbaum119k

Allegedly you can run fit-hi-c on these files to generate TAD calls.

ADD REPLYlink written 3 months ago by Devon Ryan89k

Thanks Devon, looking at https://github.com/ay-lab/fithic . It looks like it needs an 'interaction file':

chr1    fragmentMid1    chr2    fragmentMid2    contactCount
1   15000   1   35000   23
1   15000   1   55000   12

that is missing in the archives (?)

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum119k
1

Yeah, it appears that they didn't upload the most useful files. I've talked to folks internally and the consensus is that it's easiest to see if they can simply provide them.

ADD REPLYlink written 3 months ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1437 users visited in the last hour