Question: How to search for TF binding sites upstream of specific genes?
gravatar for a.rex
8 months ago by
a.rex180 wrote:

I have a reference genome and set of annotations in a gff file (which of course I can easily convert to a fasta/bed etc).

I have a list of candidate genes which I want to extract an upstream 3kb sequence from.

I then want to run this upstream region through some TF prediction software to look for putative TF binding sites (i.e. potential enhancers).

Does anyone have a recommendation for which software to use for this? Also, how can I extract out the upstream region?

sequence • 534 views
ADD COMMENTlink modified 8 months ago by Alex Reynolds28k • written 8 months ago by a.rex180
gravatar for Alex Reynolds
8 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Convert gene annotations from GFF to 3k-padded windows in BED via convert2bed:

$ awk '$3 == "gene"' annotations.gff \
    | convert2bed -i gff - \
    | grep -wFf genes.txt - \
    | awk -vwindow=3000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), $2, $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, $3, ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
    > promoters.bed

The file genes.txt would be a file containing a list of genes of interest. This is used with grep to filter for your genes of interest.

Convert promoters.bed to promoters.fa via samtools faidx, a set of reference genome FASTA files, and a helper script like, e.g.:

$ /path/to/ --fastaIsUncompressed --fastaDir=/path/to/genome/fasta < promoters.bed > promoters.fa

This script is available on Github Gist:

(Note to other mods: I'm using an URL shortener, as Gist code is otherwise pasted directly into the answer.)

Once you have this FASTA file, you can run it through a tool like FIMO, using a TF model database like Jaspar, UniPROBE, or TRANSFAC to find binding sites. Or you can use MEME to discover novel motif models and TOMTOM to compare them against existing, published TF model databases. Another tool people use for novel motif discovery and comparison against published models is HOMER.

ADD COMMENTlink modified 8 months ago • written 8 months ago by Alex Reynolds28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour