Question

Separating gene list into chromatin domains and analysing separately within each domain

0

Entering edit mode

8.1 years ago

biostart ▴ 370

Hello,

I have gene expression from RNA-seq, and want to separate genes into categories based to which TAD they belong. Say, I have coordinates of genes together with expression in one file and coordinates of TADs in another file, and I want to intersect these two files and add in the resulting new file with genes a new column with the number of the TAD to which a given gene belongs.

And the next step is to compare gene expression inside and outside each TAD.

Is there already a shared solution to do this?

Thanks!

RNA-Seq ChIP-Seq Hi-C • 2.4k views

ADD COMMENT • link updated 8.1 years ago by Alex Reynolds 35k • written 8.1 years ago by biostart ▴ 370

1

Entering edit mode

Have you tried bedtools intersect?

$intersect -a <genes> -b <tads> -loj > genes_at_tads.bed

ADD REPLY • link 8.1 years ago by Fidel ★ 2.0k

0

Entering edit mode

Yes, I actually ended up sorting both files and then applying intersectBed with option -wo. Which is equivalent to what you proposed. This, however, does not mark TADs by numbers (1,2,3, etc). So any downstream analysis requires an additional step reading the TAD coordinates and comparing them. Which means, I am afraid, that there is no ready solution to compare gene expression inside and outside each TAD? Has to be written manually?

ADD REPLY • link 8.1 years ago by biostart ▴ 370

1

Entering edit mode

Can you show how your TADs are saved? The intersect command will print per each gene the TAD it overlaps including the ID of the TAD.

I assumed that each or your TADs had an ID. You can add a number to each TAD as follows:

perl -lane '$count++; $,="\t", $F[3]=$count; print @F' TADS.bed > TADS_with_number.bed

Notice that I assume that you already have the TADs as a .bed file in which the 4th column corresponds to the ID.

ADD REPLY • link 8.1 years ago by Fidel ★ 2.0k

0

Entering edit mode

Are you familiar with any particular programming language such as R?

ADD REPLY • link 8.1 years ago by Sean Davis 26k

0

Entering edit mode

The question is whether a solution already exists to not repeat it. The task seems to be quite common.

Any language would be fine. Perl, etc

ADD REPLY • link 8.1 years ago by biostart ▴ 370

0

Entering edit mode

GenomicRanges in Bioconductor supports this type of operation in all its simplicity or complexity (you would roll your own solution).

ADD REPLY • link 8.1 years ago by Sean Davis 26k

score 1 · Answer 1 · 2016-03-22

If your TADs are BED files with the ID column containing a unique label (such as a unique number or other string that acts as a unique identifier), then you can use BEDOPS bedmap --echo-map-id-uniq to get a unique list of IDs of mapping TADs.

For example:

$ bedmap --echo --echo-map-id-uniq --delim '\t' genes.bed TADS.bed > answer.bed

The first columns of answer.bed contain each gene from genes.bed. The remaining columns contain a semi-colon delimited list of unique TAD IDs, for TADs which overlap the gene by one or more bases (when there are overlaps).

From here, it should be a simple matter to do set operations on genes which do and do not have associations with TAD IDs, and then do the respective signal analysis on subsets.