Question: What do the columns in this Hi-C file relate to?
0
gravatar for Hushus
19 months ago by
Hushus 20
Hushus 20 wrote:

Hello world,

So I'm trying to find out if certain regions of interest interact from these published results. I am no specialist when it comes to Hi-C data, or bioinformatics clearly, but I do have a strong theoretical background and have watched and read tutorials on how to process RAW reads. So, when I finally felt confident to process the reads, I went to the accension and picked up these files which were described in the paper as RAW reads but they obviously are not. I've tried plugging this file into SeqMonk but it would'nt take it because this file is weird?

I want to know what these columns are because I've asked the authors but to no solid reply. Also, how do I modify this file for better visualization?

Self described as:

Library strategy: Hi-C Hi-C reads were aligned using Bowtie 0.12.7 with default parameters and “-m 1” PCR duplicate reads were removed GC content, mappability, and fragment length effects were normalized as described in Hou et al., Molecular Cell 48, 471-484 (2012). Genome_build: dm3 Supplementary_files_format_and_content: Hi-C processed files are in a modified bed format. Each row lists the chromosome and the start and end coordinates of two interacting bins as well as the normalized interaction frequency between these two bins

There are no headers on this file and to me this is not a traditional .hic file. What do you interpret these columns to be?! Also apparently in column "I" 0 = + strand, 16 = - strand...

https://ibb.co/iW6BdR

chromatin capture hi-c • 611 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by Hushus 20

It is best to upload images to a free image hosting provider (e.g https://imgbb.com/ ) and then include the http links in your post.

ADD REPLYlink written 19 months ago by genomax68k

As much as we appreciate the humor in the title, a brief description of your problem would be more useful and more appropriate.

ADD REPLYlink written 19 months ago by RamRS22k

Can you provide a link for the paper which corresponds to this data? Did this file come from supplementary materials or GEO/SRA? I assume the Molecular Cell reference is only describing the method used in that paper?

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax68k
1

FWIW: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1551439

ADD REPLYlink written 19 months ago by Sean Davis25k
1

Raw fastq data can be found at ENA.

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax68k

Thank you for your support genomax and Sean Davis.

I'm guessing you guys suggest I start from the raw reads instead of trying to decipher this formatting?

ADD REPLYlink written 19 months ago by Hushus 20

The paper referred above is their own. Take a look at it and the supplementary files. They may be more useful from the file above.

ADD REPLYlink written 19 months ago by genomax68k

Depends on your use case. Processing Hi-C data is pretty compute-intensive in many cases (many reads, multiple steps). If your goal is to step off from where the paper left off, using their processed data will be simplest. If you already have a Hi-C pipeline (or want to go through the process of developing one), then starting with raw reads seems a great way to go.

ADD REPLYlink written 19 months ago by Sean Davis25k

Yeah this has gave me more headache than doing it myself. Ill start from scratch. Thanks for your help

ADD REPLYlink written 19 months ago by Hushus 20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1725 users visited in the last hour