How to search for TF binding sites upstream of specific genes?
1
0
Entering edit mode
5.7 years ago
a.rex ▴ 350

I have a reference genome and set of annotations in a gff file (which of course I can easily convert to a fasta/bed etc).

I have a list of candidate genes which I want to extract an upstream 3kb sequence from.

I then want to run this upstream region through some TF prediction software to look for putative TF binding sites (i.e. potential enhancers).

Does anyone have a recommendation for which software to use for this? Also, how can I extract out the upstream region?

sequence • 1.9k views
ADD COMMENT
5
Entering edit mode
5.7 years ago

Convert gene annotations from GFF to 3k-padded windows in BED via convert2bed:

$ awk '$3 == "gene"' annotations.gff \
    | convert2bed -i gff - \
    | grep -wFf genes.txt - \
    | awk -vwindow=3000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), $2, $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, $3, ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
    > promoters.bed

The file genes.txt would be a file containing a list of genes of interest. This is used with grep to filter for your genes of interest.

Convert promoters.bed to promoters.fa via samtools faidx, a set of reference genome FASTA files, and a helper script like bed2fastaidx.pl, e.g.:

$ /path/to/bed2fastaidx.pl --fastaIsUncompressed --fastaDir=/path/to/genome/fasta < promoters.bed > promoters.fa

This script is available on Github Gist: https://bit.ly/2nCvej2

(Note to other mods: I'm using an URL shortener, as Gist code is otherwise pasted directly into the answer.)

Once you have this FASTA file, you can run it through a tool like FIMO, using a TF model database like Jaspar, UniPROBE, or TRANSFAC to find binding sites. Or you can use MEME to discover novel motif models and TOMTOM to compare them against existing, published TF model databases. Another tool people use for novel motif discovery and comparison against published models is HOMER.

ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6