Question: TAD (Topologically Associating Domain) boundaries
3
gravatar for LGMgeo
10 weeks ago by
LGMgeo30
European Union
LGMgeo30 wrote:

Is there a public resource where I can download a BED file with the TAD (Topologically Associating Domain) boundaries? Or a BED file with the TAD?

Also, does it exist a clear definition for TAD boundaries size?

Many thanks

tad • 478 views
ADD COMMENTlink modified 6 weeks ago by osieman5210 • written 10 weeks ago by LGMgeo30

A search led to this paper. Look at supplementary data 1 and 2.

ADD REPLYlink written 10 weeks ago by genomax37k

Which genome ?

ADD REPLYlink written 10 weeks ago by geek_y8.1k

Sorry, in human genome

ADD REPLYlink written 10 weeks ago by LGMgeo30

For human genome, the supplementary data 1 looks good. Many thanks. (supplementary data 2 is for Drosophila)

If it may interest someone else, I extracted following informations from the supplementary data 1

  • genome version: hg19

  • 816 genomic regulatory blocks (GRBs), predicting the boundaries of TAD

  • on chromosomes 1-22,X

  • display a range of sizes from 10 kb to 7.2 Mb

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by LGMgeo30

Your question is about TADs and now you are looking at GRB. Both are different.

ADD REPLYlink written 10 weeks ago by geek_y8.1k

Ok, I'm certainly wrong but I'm not sure to understand.

In the above paper, authors said that clusters of CNE (described previously as GRBs) strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human.

So, according to you, is it better to use other sources? As you suggest, I can download human ES cell and fibroblasts topological domains from:

http://chromosome.sdsc.edu/mouse/hi-c/download.html

Many thanks for any help you can provide me.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by LGMgeo30

Dear Igor, as you seem to be a TAD specialist, I have a last question.

We know that disruption of TAD boundaries with structural variation can affect the expression of nearby genes, and this can cause disease.

Do you have any idea of if it can affect all the genes of the TAD? Or only those located at a certaine distance (if yes which one?) of the boundarie?

Many thanks for any help you can provide me

ADD REPLYlink written 10 weeks ago by LGMgeo30

Not a TAD specialist, but I work with people who work with TADs.

The classic theory is that all the genes within a single TAD are correlated (of course, the correlation is generally fairly poor). For example, see Fig 1E: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4831574/figure/F1/

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by igor4.7k

Dear Igor ,

Is there any readme file availible for these files? I am also intrested in these TAD's. I don't really understand what you mean with the bin size and boundary score.

I hope you can help

ADD REPLYlink written 6 weeks ago by osieman5210
3
gravatar for igor
10 weeks ago by
igor4.7k
United States
igor4.7k wrote:

ENCODE has some Hi-C with domains/boundaries in BED format: https://www.encodeproject.org/search/?type=Experiment&assay_title=Hi-C&files.file_type=bed+bed3%2B

ADD COMMENTlink written 10 weeks ago by igor4.7k

Thanks Igor for your link!

It gives access to 55 human TAD BED files (not sorted, and with from 3000 to 6000 TAD per file):

31 files on hg19

24 files on hg38
ADD REPLYlink written 10 weeks ago by LGMgeo30

Those numbers seem reasonable.

ADD REPLYlink written 10 weeks ago by igor4.7k

By the way, were you able to batch download them? For some reason, the file is blank for me (but it works for other search results).

ADD REPLYlink written 10 weeks ago by igor4.7k

Yes I was.

Click the "bed bed3+" button on your link (else the "file.txt" is blank). Then, click the “Download” button to download a “files.txt” file that contains a list of URLs to a file containing all the experimental metadata and links to download the file.

Then, keep only the *.bed URLs in your “files.txt”.

Then use the following command to download all the BED files in the list:

xargs -n 1 curl -O -L < files.txt

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by LGMgeo30

Thanks. I thought that having "bed bed3+" selected would filter for only bed files, but I guess you have to do that manually.

ADD REPLYlink written 10 weeks ago by igor4.7k

Or you can do a grep command:

grep "bed" listOfFiles.txt > TADbedFiles.list.txt

There are also some bedpe files (new format to allow inter-chromosomal feature definitions). You can keep them or not.

ADD REPLYlink written 10 weeks ago by LGMgeo30

By the way, do you have any idea of the TAD boundaries size? Is there a clear definition?

ADD REPLYlink written 10 weeks ago by LGMgeo30
1

The boundary size should be equal to the bin size.

The genome is split into bins. Each bin is assigned a boundary score. The bins with local maximum boundary scores become the boundaries and separate the neighboring TADs. Thus, each bin is either in a specific TAD or is a boundary.

ADD REPLYlink written 10 weeks ago by igor4.7k

Really interesting and clear, many thanks!

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by LGMgeo30

Dear Igor ,

Is there any readme file availible for these files? I am also intrested in these TAD's. I don't really understand what you mean with the bin size and boundary score.

I hope you can help

ADD REPLYlink written 6 weeks ago by osieman5210
1

There should be a description somewhere, but I am not sure where. That would be a question for ENCODE.

Regarding bins, most operations in Hi-C are on a bin level since it's not possible to get single-base resolution. This means the genome is broken into bins/regions/windows (usually 10-40 kb).

A good review is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347522/ . Specifically regarding boundary calculations:

An approach by Dixon et al. uses the following statistic: for each bin, we calculate the difference between its average upstream interactions and its average downstream interactions (within some genomic range). This difference is then transformed into a chi-squared statistic and the resulting value is referred to as the directionality index. At the boundaries of TADs, we expect to see a sharp change in the directionality index. Boundaries are then associated with each other using a Hidden Markov Model. Alternatively, others have simply used the ratio between average upstream and average downstream interactions.

An alternative approach is to calculate for each bin the average of interaction frequencies crossing over it (within some genomic range). This is referred to as the insulation score and can be thought of as the average of a square sliding along the matrix diagonal. We expect that this value will be lower at TAD boundaries. Then one can use standard techniques to find local minima and use those as boundaries, and define regions between consecutive boundaries to be TADs.

ADD REPLYlink written 6 weeks ago by igor4.7k

I will ask ENCODE about the readme, thanks for the explanation!

ADD REPLYlink written 6 weeks ago by osieman5210

Could you then share their answer? Thanks

ADD REPLYlink written 6 weeks ago by LGMgeo30

So I have another question about these TAD BED files and I hope that someone can help. So whats the actual diffrence between these files. If I am not wrong tthe most of them are (H19) genome files so I would assume that the start and stop locations in these files would be the match. Are these files generated from different experiments/predictions? I am a bit confused, sorry.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by osieman5210

They are from different cell types.

ADD REPLYlink written 5 weeks ago by igor4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1347 users visited in the last hour