Prediction of TF binding sites at genome wide scale
1
0
Entering edit mode
6.2 years ago

I want to see the overall binding pattern of a TF (e.g. ARID3A) on the complete human genome (hg19). This task comprise of 2 steps:

1- Take human genome (hg19) fatsa and divide it into bins of 500 neucleotides. There will be two files, one containing the coordinates (as below) and other the whole genome fasta sequence

chrom   Start   End
chr1   1       500
chr1   500     1000
chr1   1000    1500


2- Use one or different tools to identify binding sites of given TF in each of those bin, so tge final results I want is like:

chrom   Start   End   ARID3A
chr1   1       500   binding
chr1   500     1000  no-binding
chr1   1000    1500  binding


If anybody has done something similar then kindly guide me how can I compartmentalize the genome into bin of size 500 and then by using which tools I can predict the binding sites which give me results at each bin level? Thank you.

ChIP-Seq TFBS Prediction • 1.4k views
0
Entering edit mode

how are you going to handle motifs that span two of your segments?

0
Entering edit mode

A possible option could be to use sliding window of lets say 100 for segmenting the genome. In this case the sequences will be 1:500, 100:600, 200:700 and so on.. I think in that case I can overcome the issue you mentioned.

0
Entering edit mode
6.2 years ago

Well, I found answer to the first part:

Divide the human genome into windows of 500 neucleotides:

\$ bedtools makewindows -g hg19.genome -w 500

Here the genome file contains the length of each chromosome in hg19, it is available here: https://github.com/arq5x/bedtools/blob/master/genomes/human.hg19.genome