Question: Identification Of Genomic Regions Where Multiple Tf Binds.
0
gravatar for Dataminer
7.3 years ago by
Dataminer2.6k
Netherlands
Dataminer2.6k wrote:

Hi!

I have peak called data of 8 transcription factors (using MACS on BED files).

The format of each file is:
Chr Chr_Start Chr_Stop

Basically three columns.

I want to find the regions where atleast 4 TF bind (Any 4).

Note: I already have a union of these regions in a file and have counted tags for each TF in these region.

Thank you,

overlap chip-seq • 1.4k views
ADD COMMENTlink modified 7.3 years ago by Hanif Khalak1.2k • written 7.3 years ago by Dataminer2.6k
1

The Answer is here : http://biostar.stackexchange.com/questions/13548/bedtools-compare-multiple-bed-files

ADD REPLYlink written 7.3 years ago by Dataminer2.6k
4
2
gravatar for Ian Simpson
7.3 years ago by
Ian Simpson910
Edinburgh
Ian Simpson910 wrote:

Well one of the first things you need to decide is how you define a 'region'. Fixed size, minimum TF density etc. If you can (albeit fairly arbitrarily) decide this it's simply a case of windowing across the sequences and keeping running totals for the TFs in the bins. You can then summarise the window counts across the 8 and only keep the ones where the sum is greater than 4.

If I were doing this I would hack together a quick Perl script to do the job. I wouldn't think this would take too long to do if you're familiar with scripting.

ADD COMMENTlink written 7.3 years ago by Ian Simpson910

@Ian: I like a good Perl hack myself - still, interval logic is best dealt with through a library/module. It's not quite as sinister as regex for XML, but I've tried it from scratch and there are a number of gotchas that make anything quick/throw-away prone to error

ADD REPLYlink written 7.3 years ago by Hanif Khalak1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1049 users visited in the last hour