Question: How to search for TF binding sites upstream of specific genes?
gravatar for a.rex
23 months ago by
a.rex220 wrote:

I have a reference genome and set of annotations in a gff file (which of course I can easily convert to a fasta/bed etc).

I have a list of candidate genes which I want to extract an upstream 3kb sequence from.

I then want to run this upstream region through some TF prediction software to look for putative TF binding sites (i.e. potential enhancers).

Does anyone have a recommendation for which software to use for this? Also, how can I extract out the upstream region?

sequence • 994 views
ADD COMMENTlink modified 23 months ago by Alex Reynolds30k • written 23 months ago by a.rex220
gravatar for Alex Reynolds
23 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Convert gene annotations from GFF to 3k-padded windows in BED via convert2bed:

$ awk '$3 == "gene"' annotations.gff \
    | convert2bed -i gff - \
    | grep -wFf genes.txt - \
    | awk -vwindow=3000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), $2, $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, $3, ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
    > promoters.bed

The file genes.txt would be a file containing a list of genes of interest. This is used with grep to filter for your genes of interest.

Convert promoters.bed to promoters.fa via samtools faidx, a set of reference genome FASTA files, and a helper script like, e.g.:

$ /path/to/ --fastaIsUncompressed --fastaDir=/path/to/genome/fasta < promoters.bed > promoters.fa

This script is available on Github Gist:

(Note to other mods: I'm using an URL shortener, as Gist code is otherwise pasted directly into the answer.)

Once you have this FASTA file, you can run it through a tool like FIMO, using a TF model database like Jaspar, UniPROBE, or TRANSFAC to find binding sites. Or you can use MEME to discover novel motif models and TOMTOM to compare them against existing, published TF model databases. Another tool people use for novel motif discovery and comparison against published models is HOMER.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Alex Reynolds30k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1112 users visited in the last hour