Question

Identification Of Genomic Regions Where Multiple Tf Binds.

0

Entering edit mode

12.3 years ago

Dataminer ★ 2.8k

Hi!

I have peak called data of 8 transcription factors (using MACS on BED files).

The format of each file is:
Chr Chr_Start Chr_Stop

Basically three columns.

I want to find the regions where atleast 4 TF bind (Any 4).

Note: I already have a union of these regions in a file and have counted tags for each TF in these region.

Thank you,

chip-seq overlap • 2.7k views

ADD COMMENT • link updated 12.3 years ago by Hanif Khalak ★ 1.3k • written 12.3 years ago by Dataminer ★ 2.8k

1

Entering edit mode

The Answer is here: http://biostar.stackexchange.com/questions/13548/bedtools-compare-multiple-bed-files

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.3 years ago by Dataminer ★ 2.8k

Ram · Answer 1 · 2012-01-05

Several previous BioStar entries address the basic "overlapping intervals" problem:

score 2 · Answer 2 · 2012-01-05

2

Entering edit mode

12.3 years ago

Ian Simpson ▴ 960

Well one of the first things you need to decide is how you define a 'region'. Fixed size, minimum TF density etc. If you can (albeit fairly arbitrarily) decide this it's simply a case of windowing across the sequences and keeping running totals for the TFs in the bins. You can then summarise the window counts across the 8 and only keep the ones where the sum is greater than 4.

If I were doing this I would hack together a quick Perl script to do the job. I wouldn't think this would take too long to do if you're familiar with scripting.

ADD COMMENT • link 12.3 years ago by Ian Simpson ▴ 960

0

Entering edit mode

@Ian: I like a good Perl hack myself - still, interval logic is best dealt with through a library/module. It's not quite as sinister as regex for XML, but I've tried it from scratch and there are a number of gotchas that make anything quick/throw-away prone to error

ADD REPLY • link 12.3 years ago by Hanif Khalak ★ 1.3k