Question: How To Query The Human Chipseq Data (Encode) For A Transcription Factor Binding To A Specific Gene Promoter?
1
gravatar for Ashutosh Pandey
5.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Is there any way to query all human chip-seq data for transcription factor binding to a specific gene promoter. I would like to create a map of TFs that bind to a specific chromosomal region corresponding to the promoter of my gene of interest. For example, chip seq has been done for the VDR, STAT, CREB etc. How do I determine if VDR or these other factors are bound to the promoter of my gene of interest from chip seq data. I would like to use in silico cis-element predictors along with Chip-seq data to help us narrow functional TF-binding in the promoter. We would then confirm these sites with our own Chip analysis in the mouse models.

ADD COMMENTlink modified 5.8 years ago by sjneph600 • written 5.8 years ago by Ashutosh Pandey11k
2
gravatar for sjneph
5.8 years ago by
sjneph600
sjneph600 wrote:

This may be overkill since you're only interested in one gene. If you have a bunch of BED files, one for each ChIP-seq experiment, you could put the appropriate TF name in the 4th column of a file, and use bedmap to map this information over to your region of interest. Example:

// in creb.bed
chr1   4012  4103  CREB
chr1   5500  5750  CREB
...
chrY   3100000   3100050   CREB


// in stat1.bed
chr1  500   625  STAT1
...

// similar files, one for every other TF

Define your region of interest in a BED file. Say you are looking at the CTCF gene.

// in myfave.bed
chr16   67596110    67596310  CTCF  +

  bedops -u creb1.bed stat1.bed ... | bedmap --echo --echo-map-id myfave.bed - > answer.bed

where you pass all/any number of TF files to the first bedops command. The bedmap command tells you which TFs are in your region of interest. If you only want distinct TFs (so that, for example, STAT1 doesn't show up 5 times), replace --echo-map-id with --echo-map-id-uniq. The output file would look something like:

// in answer.bed
chr16   67596310    67673088  CTCF  +|STAT1;VDR

where the stuff after the '|' symbol includes the TF information you want (all separated by semicolons). Keep in mind that you can put as many rows as you want in myfave.bed so this approach scales as needed. The only other requirement is that all of these BED files be properly sorted.

The bedmap program has other useful things it can report (all in one pass of the data) - perhaps you want the binding sites themselves and not just the IDs (see --echo-map). There are also multiple ways to specify what it means for an element to lay within your region of interest (ie; should a peak call have at least 50% of its genomic length lying inside what you define to be the promoter region? See --fraction-map and related overlap constraint options).

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by sjneph600

HI,

I have bunch of .bam files for chip seq experiments from ENCODE. I converted them to .bed files using bamToBed utility. I created a bed file of region of my interest and used your command.

bedops -u creb1.bed stat1.bed ... | bedmap --echo --echo-map-id myfave.bed - > answer.bed

 

But my output did not showed anything after pipe

chr6    35436177    35438558    RPL10A    +|
chr1    24018293    24022913    RPL11    +|
chr9    130209954    130213684    RPL12    -|

Am I doing something wrong here??

 

Thanks

ADD REPLYlink written 4.5 years ago by Varun Gupta1.0k
1
gravatar for Ashutosh Pandey
5.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Hi, Just realized that Biomart can be used to get a list of all regulatory features for a particular gene in human genome. But any other input would be appreciated.

ADD COMMENTlink written 5.8 years ago by Ashutosh Pandey11k

thanks for following up with an answer

ADD REPLYlink written 5.8 years ago by Istvan Albert ♦♦ 78k

HI Ashutosh,

Can you tell how you used Biomart to get list of all regulatory features for a particular gene in human genome??

 

Thanks

ADD REPLYlink written 4.5 years ago by Varun Gupta1.0k

Hi Varun,

Following links have those files.  

ftp://ftp.ensembl.org/pub/release-75/regulation/homo_sapiens/

I think ftp://ftp.ensembl.org/pub/release-75/regulation/homo_sapiens/AnnotatedFeatures.gff.gz is the combined file. It contains the genomic coordinates for the regulatory eleemnts. You may to use some tools like snpEff or bedtools to annotate these cooredinates with genes they fall into or they lie near to. 

 

 

ADD REPLYlink written 4.5 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1458 users visited in the last hour