Question

Prediction of TF binding sites at genome wide scale

0

Entering edit mode

9.3 years ago

Bioinformatist Newbie ▴ 270

I want to see the overall binding pattern of a TF (e.g. ARID3A) on the complete human genome (hg19). This task comprise of 2 steps:

1- Take human genome (hg19) fatsa and divide it into bins of 500 neucleotides. There will be two files, one containing the coordinates (as below) and other the whole genome fasta sequence

chrom   Start   End
 chr1   1       500
 chr1   500     1000
 chr1   1000    1500

2- Use one or different tools to identify binding sites of given TF in each of those bin, so tge final results I want is like:

chrom   Start   End   ARID3A
 chr1   1       500   binding
 chr1   500     1000  no-binding
 chr1   1000    1500  binding

If anybody has done something similar then kindly guide me how can I compartmentalize the genome into bin of size 500 and then by using which tools I can predict the binding sites which give me results at each bin level? Thank you.

ChIP-Seq TFBS Prediction • 2.2k views

ADD COMMENT • link 9.3 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

how are you going to handle motifs that span two of your segments?

ADD REPLY • link 9.3 years ago by TriS ★ 4.8k

0

Entering edit mode

A possible option could be to use sliding window of lets say 100 for segmenting the genome. In this case the sequences will be 1:500, 100:600, 200:700 and so on.. I think in that case I can overcome the issue you mentioned.

ADD REPLY • link 9.3 years ago by Bioinformatist Newbie ▴ 270

score 0 · Answer 1 · 2016-07-19

0

Entering edit mode

9.3 years ago

Bioinformatist Newbie ▴ 270

Well, I found answer to the first part:

Divide the human genome into windows of 500 neucleotides:

$ bedtools makewindows -g hg19.genome -w 500

Here the genome file contains the length of each chromosome in hg19, it is available here: https://github.com/arq5x/bedtools/blob/master/genomes/human.hg19.genome

ADD COMMENT • link 9.3 years ago by Bioinformatist Newbie ▴ 270