TAD (Topologically Associating Domain) boundaries
1
3
Entering edit mode
5.5 years ago
LGMgeo ▴ 100

Is there a public resource where I can download a BED file with the TAD (Topologically Associating Domain) boundaries? Or a BED file with the TAD?

Also, does it exist a clear definition for TAD boundaries size?

Many thanks

TAD • 8.5k views
0
Entering edit mode

A search led to this paper. Look at supplementary data 1 and 2.

0
Entering edit mode

Which genome ?

0
Entering edit mode

Sorry, in human genome

0
Entering edit mode

For human genome, the supplementary data 1 looks good. Many thanks. (supplementary data 2 is for Drosophila)

If it may interest someone else, I extracted following informations from the supplementary data 1

• genome version: hg19

• 816 genomic regulatory blocks (GRBs), predicting the boundaries of TAD

• on chromosomes 1-22,X

• display a range of sizes from 10 kb to 7.2 Mb

0
Entering edit mode

Your question is about TADs and now you are looking at GRB. Both are different.

0
Entering edit mode

Ok, I'm certainly wrong but I'm not sure to understand.

In the above paper, authors said that clusters of CNE (described previously as GRBs) strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human.

So, according to you, is it better to use other sources? As you suggest, I can download human ES cell and fibroblasts topological domains from:

Many thanks for any help you can provide me.

0
Entering edit mode

Dear Igor, as you seem to be a TAD specialist, I have a last question.

We know that disruption of TAD boundaries with structural variation can affect the expression of nearby genes, and this can cause disease.

Do you have any idea of if it can affect all the genes of the TAD? Or only those located at a certaine distance (if yes which one?) of the boundarie?

Many thanks for any help you can provide me

0
Entering edit mode

Not a TAD specialist, but I work with people who work with TADs.

The classic theory is that all the genes within a single TAD are correlated (of course, the correlation is generally fairly poor). For example, see Fig 1E: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4831574/figure/F1/

0
Entering edit mode

Dear Igor ,

Is there any readme file availible for these files? I am also intrested in these TAD's. I don't really understand what you mean with the bin size and boundary score.

I hope you can help

3
Entering edit mode
5.5 years ago
igor 13k

ENCODE has some Hi-C with domains/boundaries in BED format: https://www.encodeproject.org/search/?type=Experiment&assay_title=Hi-C&files.file_type=bed+bed3%2B

0
Entering edit mode

It gives access to 55 human TAD BED files (not sorted, and with from 3000 to 6000 TAD per file):

31 files on hg19

24 files on hg38

0
Entering edit mode

Those numbers seem reasonable.

0
Entering edit mode

By the way, were you able to batch download them? For some reason, the file is blank for me (but it works for other search results).

0
Entering edit mode

Yes I was.

Click the "bed bed3+" button on your link (else the "file.txt" is blank). Then, click the “Download” button to download a “files.txt” file that contains a list of URLs to a file containing all the experimental metadata and links to download the file.

Then, keep only the *.bed URLs in your “files.txt”.

Then use the following command to download all the BED files in the list:

xargs -n 1 curl -O -L < files.txt

0
Entering edit mode

Thanks. I thought that having "bed bed3+" selected would filter for only bed files, but I guess you have to do that manually.

0
Entering edit mode

Or you can do a grep command:

grep "bed" listOfFiles.txt > TADbedFiles.list.txt

There are also some bedpe files (new format to allow inter-chromosomal feature definitions). You can keep them or not.

0
Entering edit mode

By the way, do you have any idea of the TAD boundaries size? Is there a clear definition?

1
Entering edit mode

The boundary size should be equal to the bin size.

The genome is split into bins. Each bin is assigned a boundary score. The bins with local maximum boundary scores become the boundaries and separate the neighboring TADs. Thus, each bin is either in a specific TAD or is a boundary.

0
Entering edit mode

Really interesting and clear, many thanks!

0
Entering edit mode

Dear Igor ,

Is there any readme file availible for these files? I am also intrested in these TAD's. I don't really understand what you mean with the bin size and boundary score.

I hope you can help

1
Entering edit mode

There should be a description somewhere, but I am not sure where. That would be a question for ENCODE.

Regarding bins, most operations in Hi-C are on a bin level since it's not possible to get single-base resolution. This means the genome is broken into bins/regions/windows (usually 10-40 kb).

A good review is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347522/ . Specifically regarding boundary calculations:

An approach by Dixon et al. uses the following statistic: for each bin, we calculate the difference between its average upstream interactions and its average downstream interactions (within some genomic range). This difference is then transformed into a chi-squared statistic and the resulting value is referred to as the directionality index. At the boundaries of TADs, we expect to see a sharp change in the directionality index. Boundaries are then associated with each other using a Hidden Markov Model. Alternatively, others have simply used the ratio between average upstream and average downstream interactions.

An alternative approach is to calculate for each bin the average of interaction frequencies crossing over it (within some genomic range). This is referred to as the insulation score and can be thought of as the average of a square sliding along the matrix diagonal. We expect that this value will be lower at TAD boundaries. Then one can use standard techniques to find local minima and use those as boundaries, and define regions between consecutive boundaries to be TADs.

0
Entering edit mode

I will ask ENCODE about the readme, thanks for the explanation!

0
Entering edit mode

Could you then share their answer? Thanks

0
Entering edit mode

So I have another question about these TAD BED files and I hope that someone can help. So whats the actual diffrence between these files. If I am not wrong tthe most of them are (H19) genome files so I would assume that the start and stop locations in these files would be the match. Are these files generated from different experiments/predictions? I am a bit confused, sorry.

1
Entering edit mode

They are from different cell types.

0
Entering edit mode

Hi everyone

I really enjoy the conversation above from TADs and bin size and hope you could help me. I download from 3D Geneome Browser some TADs annotation from different types of cell in bed file format. Some of these files were defined in 40 kb windown, but others in 25 kb. So I asked if someone knows a way to convert 25 kb to 40 kb by using bedtools or some linux commmand as awk. Still, is it a good way to get position in 40 kb windown?

BED file from IMR90_Lieberman-raw_TADs.txt

chr1 700000 1575000

chr1 1675000 1850000

chr1 1850000 2325000

chr1 2325000 3725000

chr1 3975000 6250000

chr1 6300000 6500000

chr1 6725000 8025000

chr1 8025000 8425000

chr1 8425000 8925000

chr1 8925000 9650000

chr1 9650000 9925000