I want to see the overall binding pattern of a TF (e.g. ARID3A) on the complete human genome (hg19). This task comprise of 2 steps:
1- Take human genome (hg19) fatsa and divide it into bins of 500 neucleotides. There will be two files, one containing the coordinates (as below) and other the whole genome fasta sequence
chrom   Start   End
 chr1   1       500
 chr1   500     1000
 chr1   1000    1500
2- Use one or different tools to identify binding sites of given TF in each of those bin, so tge final results I want is like:
chrom   Start   End   ARID3A
 chr1   1       500   binding
 chr1   500     1000  no-binding
 chr1   1000    1500  binding
If anybody has done something similar then kindly guide me how can I compartmentalize the genome into bin of size 500 and then by using which tools I can predict the binding sites which give me results at each bin level? Thank you.
how are you going to handle motifs that span two of your segments?
A possible option could be to use sliding window of lets say 100 for segmenting the genome. In this case the sequences will be 1:500, 100:600, 200:700 and so on.. I think in that case I can overcome the issue you mentioned.