Question: Separating gene list into chromatin domains and analysing separately within each domain
0
gravatar for biostart
4.6 years ago by
biostart350
Germany
biostart350 wrote:

Hello,

I have gene expression from RNA-seq, and want to separate genes into categories based to which TAD they belong. Say, I have coordinates of genes together with expression in one file and coordinates of TADs in another file, and I want to intersect these two files and add in the resulting new file with genes a new column with the number of the TAD to which a given gene belongs.

And the next step is to compare gene expression inside and outside each TAD.

Is there already a shared solution to do this?

Thanks!

rna-seq chip-seq hi-c • 1.4k views
ADD COMMENTlink modified 4.6 years ago by Alex Reynolds31k • written 4.6 years ago by biostart350
1

Have you tried bedtools intersect?

$intersect -a <genes> -b <tads> -loj > genes_at_tads.bed

ADD REPLYlink written 4.6 years ago by Fidel1.9k

Yes, I actually ended up sorting both files and then applying intersectBed with option -wo. Which is equivalent to what you proposed. This, however, does not mark TADs by numbers (1,2,3, etc). So any downstream analysis requires an additional step reading the TAD coordinates and comparing them. Which means, I am afraid, that there is no ready solution to compare gene expression inside and outside each TAD? Has to be written manually?

ADD REPLYlink written 4.6 years ago by biostart350
1

Can you show how your TADs are saved? The intersect command will print per each gene the TAD it overlaps including the ID of the TAD.

I assumed that each or your TADs had an ID. You can add a number to each TAD as follows:

perl -lane '$count++; $,="\t", $F[3]=$count; print @F' TADS.bed > TADS_with_number.bed

Notice that I assume that you already have the TADs as a .bed file in which the 4th column corresponds to the ID.

ADD REPLYlink written 4.6 years ago by Fidel1.9k

Are you familiar with any particular programming language such as R?

ADD REPLYlink written 4.6 years ago by Sean Davis26k

The question is whether a solution already exists to not repeat it. The task seems to be quite common.

Any language would be fine. Perl, etc

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by biostart350

GenomicRanges in Bioconductor supports this type of operation in all its simplicity or complexity (you would roll your own solution).

ADD REPLYlink written 4.6 years ago by Sean Davis26k
1
gravatar for Alex Reynolds
4.6 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

If your TADs are BED files with the ID column containing a unique label (such as a unique number or other string that acts as a unique identifier), then you can use BEDOPS bedmap --echo-map-id-uniq to get a unique list of IDs of mapping TADs.

For example:

$ bedmap --echo --echo-map-id-uniq --delim '\t' genes.bed TADS.bed > answer.bed

The first columns of answer.bed contain each gene from genes.bed. The remaining columns contain a semi-colon delimited list of unique TAD IDs, for TADs which overlap the gene by one or more bases (when there are overlaps).

From here, it should be a simple matter to do set operations on genes which do and do not have associations with TAD IDs, and then do the respective signal analysis on subsets.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1742 users visited in the last hour