Question

What do the columns in this Hi-C file relate to?

0

Entering edit mode

6.4 years ago

Hushus ▴ 20

Hello world,

So I'm trying to find out if certain regions of interest interact from these published results. I am no specialist when it comes to Hi-C data, or bioinformatics clearly, but I do have a strong theoretical background and have watched and read tutorials on how to process RAW reads. So, when I finally felt confident to process the reads, I went to the accension and picked up these files which were described in the paper as RAW reads but they obviously are not. I've tried plugging this file into SeqMonk but it would'nt take it because this file is weird?

I want to know what these columns are because I've asked the authors but to no solid reply. Also, how do I modify this file for better visualization?

Self described as:

Library strategy: Hi-C Hi-C reads were aligned using Bowtie 0.12.7 with default parameters and “-m 1” PCR duplicate reads were removed GC content, mappability, and fragment length effects were normalized as described in Hou et al., Molecular Cell 48, 471-484 (2012). Genome_build: dm3 Supplementary_files_format_and_content: Hi-C processed files are in a modified bed format. Each row lists the chromosome and the start and end coordinates of two interacting bins as well as the normalized interaction frequency between these two bins

There are no headers on this file and to me this is not a traditional .hic file. What do you interpret these columns to be?! Also apparently in column "I" 0 = + strand, 16 = - strand...

Hi-C Chromatin Capture • 1.6k views

ADD COMMENT • link 6.4 years ago by Hushus ▴ 20

0

Entering edit mode

It is best to upload images to a free image hosting provider (e.g https://imgbb.com/ ) and then include the http links in your post.

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

As much as we appreciate the humor in the title, a brief description of your problem would be more useful and more appropriate.

ADD REPLY • link 6.4 years ago by Ram 43k

0

Entering edit mode

Can you provide a link for the paper which corresponds to this data? Did this file come from supplementary materials or GEO/SRA? I assume the Molecular Cell reference is only describing the method used in that paper?

ADD REPLY • link 6.4 years ago by GenoMax 141k

1

Entering edit mode

FWIW: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1551439

ADD REPLY • link 6.4 years ago by Sean Davis 26k

1

Entering edit mode

Raw fastq data can be found at ENA.

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

Thank you for your support genomax and Sean Davis.

I'm guessing you guys suggest I start from the raw reads instead of trying to decipher this formatting?

ADD REPLY • link 6.4 years ago by Hushus ▴ 20

0

Entering edit mode

The paper referred above is their own. Take a look at it and the supplementary files. They may be more useful from the file above.

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

Depends on your use case. Processing Hi-C data is pretty compute-intensive in many cases (many reads, multiple steps). If your goal is to step off from where the paper left off, using their processed data will be simplest. If you already have a Hi-C pipeline (or want to go through the process of developing one), then starting with raw reads seems a great way to go.

ADD REPLY • link 6.4 years ago by Sean Davis 26k

0

Entering edit mode

Yeah this has gave me more headache than doing it myself. Ill start from scratch. Thanks for your help

ADD REPLY • link 6.4 years ago by Hushus ▴ 20