Question

How To Query The Human Chipseq Data (Encode) For A Transcription Factor Binding To A Specific Gene Promoter?

1

Entering edit mode

11.2 years ago

Ashutosh Pandey 12k

Is there any way to query all human chip-seq data for transcription factor binding to a specific gene promoter. I would like to create a map of TFs that bind to a specific chromosomal region corresponding to the promoter of my gene of interest. For example, chip seq has been done for the VDR, STAT, CREB etc. How do I determine if VDR or these other factors are bound to the promoter of my gene of interest from chip seq data. I would like to use in silico cis-element predictors along with Chip-seq data to help us narrow functional TF-binding in the promoter. We would then confirm these sites with our own Chip analysis in the mouse models.

chip-seq transcription-factor • 6.5k views

ADD COMMENT • link updated 11.2 years ago by sjneph ▴ 690 • written 11.2 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2013-01-30

This may be overkill since you're only interested in one gene. If you have a bunch of BED files, one for each ChIP-seq experiment, you could put the appropriate TF name in the 4th column of a file, and use bedmap to map this information over to your region of interest. Example:

// in creb.bed
chr1   4012  4103  CREB
chr1   5500  5750  CREB
...
chrY   3100000   3100050   CREB


// in stat1.bed
chr1  500   625  STAT1
...

// similar files, one for every other TF
Define your region of interest in a BED file.  Say you are looking at the CTCF gene.

// in myfave.bed
chr16   67596110    67596310  CTCF  +

bedops -u creb1.bed stat1.bed ... | bedmap --echo --echo-map-id myfave.bed - > answer.bed

where you pass all/any number of TF files to the first bedops command. The bedmap command tells you which TFs are in your region of interest. If you only want distinct TFs (so that, for example, STAT1 doesn't show up 5 times), replace --echo-map-id with --echo-map-id-uniq. The output file would look something like:

// in answer.bed
chr16   67596310    67673088  CTCF  +|STAT1;VDR

where the stuff after the '|' symbol includes the TF information you want (all separated by semicolons). Keep in mind that you can put as many rows as you want in myfave.bed so this approach scales as needed. The only other requirement is that all of these BED files be properly sorted.

The bedmap program has other useful things it can report (all in one pass of the data) - perhaps you want the binding sites themselves and not just the IDs (see --echo-map). There are also multiple ways to specify what it means for an element to lay within your region of interest (ie; should a peak call have at least 50% of its genomic length lying inside what you define to be the promoter region? See --fraction-map and related overlap constraint options).

Ram · Answer 2 · 2013-01-30

1

Entering edit mode

11.2 years ago

Ashutosh Pandey 12k

Hi, Just realized that Biomart can be used to get a list of all regulatory features for a particular gene in human genome. But any other input would be appreciated.

ADD COMMENT • link 11.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

thanks for following up with an answer

ADD REPLY • link 11.2 years ago by Istvan Albert 100k

0

Entering edit mode

Hi Ashutosh,

Can you tell how you used Biomart to get list of all regulatory features for a particular gene in human genome??

Thanks

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Varun Gupta ★ 1.3k

0

Entering edit mode

Hi Varun,

Following links have those files.

ftp://ftp.ensembl.org/pub/release-75/regulation/homo_sapiens/

I think ftp://ftp.ensembl.org/pub/release-75/regulation/homo_sapiens/AnnotatedFeatures.gff.gz is the combined file. It contains the genomic coordinates for the regulatory eleemnts. You may to use some tools like snpEff or bedtools to annotate these cooredinates with genes they fall into or they lie near to.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Ashutosh Pandey 12k